Tuesday, November 18, 2014

Data Capture Market Of 2015 - Navigating Competitive Landscape - Part 2

This is the second part of the article designed to help you see the real strenghts and weaknesses of complex Data Capture solutions of 2015. We will start looking at specific features of Data Capture tools.

SCANNING

Your Document Capture solution may or may not include scanning. If it does, the first thing that to consider is Volume. 50 scanned documents per week vs. 100,000 scanned documents a week make a world of difference.

VOLUME CONSIDERATIONS

Another thing to consider when looking at your intake volumes are two additional facts - the periods of peak or irregular workload and your anticipated growth rate for incoming documents.
•    Is your organization affected by seasonal, external or periodic surges of incoming documents that need to be ingested quickly? And
•    Is your estimated intake volumes are likely to grow in the future and at what rate? Are you planning to roll out your Document Capture solution to other departments?
When volumes are low (and stay low), almost any device that delivers sufficient image quality may do, and you may be able to capitalize on devices you already have. You may use your MFPs and desktop scanners. (Be sure to look at my cautions later on in the paper when picking your scanning devices)
Now here're a few more other important considerations that will help you better understand your scanning requirements:

TYPES OF DOCUMENTS

How many types or classes of documents are you planning to handle? A few well defined types of documents will make it easier for you to extract data out of the documents, easier to classify and process. A 'mixed bag' or different types of documents in one scanning batch takes you into a whole different ball game.
How complex are the documents and how are they received? In some cases, you only need a couple of values from a nice high-resolution document. Sometimes, you need more complex algorithms to get data from complicated, variable or low-quality documents like faxes. Is your information structured or un-structured?

TYPES OF PAPER

Different types of paper may require different types of scanners. Bank checks need to be fed into the scanner in a consistent manner, staples and paper clips must almost always be removed, heavier or thinner paper may require different processing. What does the scanner do or react when, there are holes in the document or paper clips left over, a document is wrinkled, folded or torn?

OFF-SITE CONSIDERATIONS

Can you allow the documents to be taken off-site? In some scenarios, such as highly unstructured content with a lot of information to extract, handling handwriting or extracting information out of drawings and charts, a lot of manual labour may be required and outsourcing may be an attractive option to consider.

IMAGE ENHANCEMENTS

When you use a smart phone or other device that is not specifically designed to work as a scanner, the quality of the image you can produce becomes critically important and - and you'll need to consider the use of image enhancements.
Image Enhancements such as de-skew, de-speckle, removals of shadows and others come as a separate feature  (or a product) as part of the scanning solution. For instance, Kofax offers its famous Virtual ReScan tool and Oracle Capture includes Image Enhancements as part of the main product.
While most of the times using of Image Enhancements will be desirable, you need to be very careful when using them with legal documents, as a missing comma (removed by despeckle) could have potentially bring on serious consequences.

PROTECTION AGAINST JAMS AND DOUBLE FEEDS

When scan volumes are high, additional level of intelligence in the scanner may become a necessity.  Does the scanner provide effective paper separation to reduce the chances of double-feeds in the case of poor document preparation? Does the scanner have technology to proactively detect possible paper jams and avoid damage to original documents?
Additionally you may need to look at detection of paper-clips and staples. Paperclips can cause jams, multifeeds and cause damage to electronic components in scanners. Does the scanner have a method to detect accidental paperclips? Staples can scratch glass causing streaking on images, so does the scanner have technology to immediately stop scanning to avoid damage?

OTHER FACTORS

Other factors may also come into play with scanning. For instance, high humidity causes the sheets of paper to stick together and the same stack on the same scanner might experience more double-feeds in one location versus some other location.
If you get rust left over on paper after removing staples or paper clips, that may very likely scratch the glass on some scanners and cause ongoing problems, so you may opt out for a higher end scanners, such as Microform scanners from Germany.

OTHER INTAKE MECHANISMS

Many times the majority of documents come in as email attachments, by fax or they are imported from a shared drive or a content management system. If this is your case, do not print and re-scan them just to take advantage of the scanning set of features in your software as many products are available for direct import of documents, offering significant savings on printing and scanning costs.

DATA EXTRACTION

Extraction is the next step you take after the images were captured - via the Scanning or Import feature. The critical question here is - "what information do you wish to extract?"
For example, is it the entire page or just some particular information like the Invoice Number needs to be extracted. This is very important because there is a big difference between extracting all data from an image or just the relevant data.
Assume you were capturing a legal contract. Most likely you don't need to all of its legal terms in metadata fields and the only fields you'd like to extract are date and the company name. Now the complete images of all pages of the contract will most likely be stored in your Content Management system and you may also generate Searchable PDF and use your Full Text Search Engine for additional search convenience - and these additional ways to access your scanned information may relax your extraction and classification requirements.
The following section will highlight additional aspects that may be relevant when examining the extraction tools:

TEXT OR GRAPHICAL INFORMATION

Another important consideration is the fact that most extraction tools out there are best at dealing with text based information. Now if a significant portion of your content is graphical, like  maps, drawings, graphs, that need to be automatically classified and have data extracted. Text based OCR systems seem not to work very well, so this becomes an important question you'll be asking your sales rep during the selection process.

VOLUMES

The same considerations regarding your target intake volumes, volume patterns and expected growth that we looked at in the Scanning Section above will also apply to your Data Extraction. They will drive your selection of manual, manual-assisted of fully automatic extraction tools. In addition, they will also influence your hardware and infrastructure decisions.

CLASSIFICATION

Each type of document may have different extraction requirements. You may need to extract different fields, apply different validation rules and store documents differently.
Do you have an easy way to separate these types of documents before they enter scanning or import phases or you'll be relying on your software to classify the documents for you and handle them differently?
Also, will you be able to embed bar codes in your documents to make it easy for your software to classify them - or this option is not available?

CUSTOM BUSINESS LOGIC

Custom logic will let you apply field-specific validation rules, perform database lookups to validate against the list of pre-defined values, treat values as numeric only or alpha only and apply additional business logic that will improve your accuracy and reduce (or eliminate) the manual component of data extraction. Many tools, like Oracle Capture and Oracle Forms Recognition, offer elaborate APIs and allow to perform all of these functions, but other tools may not offer this facility or may not be flexible enough for what you're trying to accomplish.

IMAGE MANAGEMENT AND CONTENT REPOSITORY

This is the third largest aspect of any Document Capture Solution and it answers the question of "Once the information is captured, classified and the metadata values are extracted, what happens to it then?" This addresses all of these other aspects of version control, indexing, management and distribution, cleansing, publishing, search and controlled retention.
Now the process of selecting a Document Management System is outside of the scope of this whitepaper, but I'll bring your attention to a few additional capture specific considerations that often get overlooked in the process:

IMAGE ANNOTATIONS

Watermarking, annotation and other image augmentation is not provided by many general purpose CMS but they are provided by Oracle WebCenter Imaging. You'll be able to apply watermarks and annotations and also link them to specific workflow steps. Now with WebCenter Imaging your original image will still be available, which may be important for legal and compliance reasons, and that may not be the case with other image manipulation tools that will in fact be applying watermarks to the actual image.

RECORD AND RETENTION MANAGEMENT

Retention will let you flag outdated content for deletion or archival, synchronize physical and electronic versions of content and keep track of the physical files, folders and boxes floating around your offices. Retention Management will also help you more effectively share content across organization and ensure it is retained for the period that it should be – regardless of the source or the system that stores it.
You may be required to ensure regulatory compliance (SOX, SEC, industry regulations) by keeping the content for required periods of time and consistently following the processing rules, such review or archival and destruction – when this period is over.
Systematically disposing of certain types of documents, when you’re no longer required to keep them – may prove extremely beneficial in case of potential litigation. You may also choose to freeze some important content, related to legal action you’re about to take – so it doesn’t get “accidentally” destroyed by a disgruntled employee.

VOLUME AND LOAD PATTERNS

Volume and load patterns still play a role in your content management, just as they do in your capture process.

OTHER REPOSITORY CONSIDERATIONS

Other Repository Considerations like Security, Conversion and Workflow are all important to consider when selecting new content management system.

LICENSING CONSIDERATIONS

And here's the big one. While every situation and every deal is different, the following considerations will be important to consider and will apply in most cases

PER-DOCUMENT CHARGES

Some vendors (Kofax being a prime example) will charge you different fees based on the volume of documents you will be scanning and processing. If you're dealing with large volumes or are expecting your intake volume to increase, such solutions may quickly become prohibitively expensive and you may want  to opt out for a solution that offers server-based licensing without the per-user or per-click components, such as Oracle WebCenter Suite.

RESELLER PRICING

In many cases Value Added Resellers enjoy special savings and buying form them may prove more cost effective then dealing directly with the vendor.

That's it for today. In the next part of this article we will complete our review of features and considerations of Data Capture tools of 2015

No comments:

Post a Comment