Tuesday, November 25, 2014

Data Capture Market Of 2015 - Navigating Competitive Landscape - Part 3

In this article we will complete our review or competitive Data Capture Market of 2015 and our in-depth exploration of features and considerations for selecting your ideal, most effective solution.


This section will bring your attention to two additional considerations that come into play when switching or enhancing your Document Capture solution


When you have an existing solution in place that has proven inadequate, be sure to 'do your homework' and understand if that's a lack of features issue or a delivery/vendor/people type of issue or maybe it’s something else.
That could then help you narrow down a list of potential vendors and quickly eliminate those who may potentially have the same type of limitations. You may also opt out for a different partner or integrator and stay with the same vendor, which may be a lot more cost effective then ripping apart the solution and starting from scratch. Many systems will look good in demo and get implemented but you have to be confident that the vendor will be there for you and be able to help you when issues come up.


And now I've saved the best for the last. To paraphrase the God Farther,
"One BA with a pen will outsmart 20 developers with latest computers"
A capture solutions can quickly get expensive, so it is vital that you only consider and implement the requirements that will really make a difference for your business users. Use the 80/20 rule where 80 percent of the benefits will come from just the 20 percent of the features. Yes, it’s also critical to pick a solution that will continue to support you as your organization grows and add features and volumes, but it’s best if you don't have to invest in all 100 percent of features and modules day one, just as it is the case with Oracle WebCenter, where you can start on relatively small footprint and have the solution expand on as needed basis.


And now that we have a system for evaluating vendors, it’s time to take a look at Oracle product offering - Oracle WebCenter Imaging, Oracle WebCenter Capture and Oracle Forms Recognition


Oracle Capture excels at Image Enhancements, flexible import tools, manually assisted indexing and generation of searchable PDF/A. Documents can then end up in Oracle WebCenter Content or WebCenter Imaging - for additional image manipulations and image-centric workflows.


Oracle Capture offers a rich set of Image Enhancements (See screenshot below) that can be applied during scanning or to imported documents:


Oracle Capture allows you to fully automatically extract bar code information, convert and check in document without any user intervention, but when additional processing is required, it makes it easy to map specific areas of each type of document to metadata fields where you'd like the information to end up (See screenshot below)

Oracle Capture then offers you rapid customization facility where you can implement additional validation, database lookups and automate all aspects of automatic or manual-assisted indexing.


Below are two real life examples that will illustrate the types of scenarios here Oracle Capture is at its best:


One of the projects we recently completed was automating workflow and document control for a nuclear power station operation management provider. As part of doing business, they exchange a ton of technical specifications and formal transmittals with their government clients and sub-suppliers. Prior to implementation, information was duplicated, not easy to organize and retrieve and they didn’t have a structured workflow process.
Before using Capture, average time spent per document was benchmarked at 2 min 17 sec – on a 15 field check in form. With using Capture for extracting verifying and correcting pre-filled values – time spent per document dropped to just over 33 seconds!
That’s over two hours of added productive time (out of estimated 4) per Document Control person per day – a 200% increase in productivity!


Another our recent project I’d like to mention is the one where we’ve automated document storage, acquisition and workflow for a mid-sized insurance company. They have significant volume of incoming claim-supporting correspondence – mostly scanned documents coming in by email.
Over the years, they’ve tried more than once to extract meaningful metadata fields such as claim number and policy id – on check in – with various degrees of success.
They’ve also relied on a custom, outdated component that automatically checked in incoming email attachments. The component was buggy and unreliable.

Implementing Oracle Capture allowed them to finally retire component and begin to extract important metadata values out of incoming documents – during the check in process. Time spent per-document dropped from an average of 1.5 minutes to just under 22 seconds!
And the unsupported and buggy custom component was replaced by Oracle Capture Import Server.


Oracle Forms Recognition intelligently locates and extracts unstructured data from a broad range of documents, including purchase orders, remittance, freight bills, and healthcare claims, reducing costly, labor-intensive manual processing tasks and errors. It will allow you to easily grow document processing volumes across applications with a scalable forms recognition system.
OFR can integrate with Capture and provides intelligent classification, data extraction and matching. The strongest sides of OFR are rule-based forms recognition and integration with other Oracle products like the EBS.


OFR can be configured to recognize a large number of document types and extract multiple information fields. For instance, when processing invoices, it can extract the header information as well as the line items from the invoice itself - and see if the total of all line items adds up to the total specified on the invoices. If business rule exception occurs, users can easily correct the situation by firing up the Verifier app. The screenshot below shows how an approval stamp on the invoice has obstructed the last line item, which triggered an exception (See screenshot below)

Use can then add a line to the list of extracted lines on the invoice, which will clear the exception.


Once the information is extracted, it can be sent to Oracle E-Business Suite while the document  itself can be stored in WebCenter Images. (See screenshot below)

The document is then readily available from within the EBS with Attachment functionality - and can be viewed and annotated without leaving the EBS.


The key strength of Oracle Capture Tools is not only their industry leading features like Image Enhancements in Capture and Automatic Classification and Intelligent Extraction in OFR, but their seamless integration with other Oracle Products, like the EBS and PeopleSoft.

From Capture to Extraction and Indexing to Processing and Managing to quick and easy access - these products designed to work together - and fully backed up by Oracle Support.


While marketing data and high level observations are readily available from various sources across the Internet, businesses are often fall victims of hype and complexity and end up making costly mistakes when it comes to selecting and integrating components of their Document Capture system.
In addition to (yet another) brief market review, this whitepaper provides an in-depth look at the key 'moving parts' hidden behind even document capture solution and provides complete and  easy to follow checklist, that helps organizations to quickly turn vague intangible requirements into specific actionable steps - steps that make selection of vendors, tools and components easy and fail-proof.

Tuesday, November 18, 2014

Data Capture Market Of 2015 - Navigating Competitive Landscape - Part 2

This is the second part of the article designed to help you see the real strenghts and weaknesses of complex Data Capture solutions of 2015. We will start looking at specific features of Data Capture tools.


Your Document Capture solution may or may not include scanning. If it does, the first thing that to consider is Volume. 50 scanned documents per week vs. 100,000 scanned documents a week make a world of difference.


Another thing to consider when looking at your intake volumes are two additional facts - the periods of peak or irregular workload and your anticipated growth rate for incoming documents.
•    Is your organization affected by seasonal, external or periodic surges of incoming documents that need to be ingested quickly? And
•    Is your estimated intake volumes are likely to grow in the future and at what rate? Are you planning to roll out your Document Capture solution to other departments?
When volumes are low (and stay low), almost any device that delivers sufficient image quality may do, and you may be able to capitalize on devices you already have. You may use your MFPs and desktop scanners. (Be sure to look at my cautions later on in the paper when picking your scanning devices)
Now here're a few more other important considerations that will help you better understand your scanning requirements:


How many types or classes of documents are you planning to handle? A few well defined types of documents will make it easier for you to extract data out of the documents, easier to classify and process. A 'mixed bag' or different types of documents in one scanning batch takes you into a whole different ball game.
How complex are the documents and how are they received? In some cases, you only need a couple of values from a nice high-resolution document. Sometimes, you need more complex algorithms to get data from complicated, variable or low-quality documents like faxes. Is your information structured or un-structured?


Different types of paper may require different types of scanners. Bank checks need to be fed into the scanner in a consistent manner, staples and paper clips must almost always be removed, heavier or thinner paper may require different processing. What does the scanner do or react when, there are holes in the document or paper clips left over, a document is wrinkled, folded or torn?


Can you allow the documents to be taken off-site? In some scenarios, such as highly unstructured content with a lot of information to extract, handling handwriting or extracting information out of drawings and charts, a lot of manual labour may be required and outsourcing may be an attractive option to consider.


When you use a smart phone or other device that is not specifically designed to work as a scanner, the quality of the image you can produce becomes critically important and - and you'll need to consider the use of image enhancements.
Image Enhancements such as de-skew, de-speckle, removals of shadows and others come as a separate feature  (or a product) as part of the scanning solution. For instance, Kofax offers its famous Virtual ReScan tool and Oracle Capture includes Image Enhancements as part of the main product.
While most of the times using of Image Enhancements will be desirable, you need to be very careful when using them with legal documents, as a missing comma (removed by despeckle) could have potentially bring on serious consequences.


When scan volumes are high, additional level of intelligence in the scanner may become a necessity.  Does the scanner provide effective paper separation to reduce the chances of double-feeds in the case of poor document preparation? Does the scanner have technology to proactively detect possible paper jams and avoid damage to original documents?
Additionally you may need to look at detection of paper-clips and staples. Paperclips can cause jams, multifeeds and cause damage to electronic components in scanners. Does the scanner have a method to detect accidental paperclips? Staples can scratch glass causing streaking on images, so does the scanner have technology to immediately stop scanning to avoid damage?


Other factors may also come into play with scanning. For instance, high humidity causes the sheets of paper to stick together and the same stack on the same scanner might experience more double-feeds in one location versus some other location.
If you get rust left over on paper after removing staples or paper clips, that may very likely scratch the glass on some scanners and cause ongoing problems, so you may opt out for a higher end scanners, such as Microform scanners from Germany.


Many times the majority of documents come in as email attachments, by fax or they are imported from a shared drive or a content management system. If this is your case, do not print and re-scan them just to take advantage of the scanning set of features in your software as many products are available for direct import of documents, offering significant savings on printing and scanning costs.


Extraction is the next step you take after the images were captured - via the Scanning or Import feature. The critical question here is - "what information do you wish to extract?"
For example, is it the entire page or just some particular information like the Invoice Number needs to be extracted. This is very important because there is a big difference between extracting all data from an image or just the relevant data.
Assume you were capturing a legal contract. Most likely you don't need to all of its legal terms in metadata fields and the only fields you'd like to extract are date and the company name. Now the complete images of all pages of the contract will most likely be stored in your Content Management system and you may also generate Searchable PDF and use your Full Text Search Engine for additional search convenience - and these additional ways to access your scanned information may relax your extraction and classification requirements.
The following section will highlight additional aspects that may be relevant when examining the extraction tools:


Another important consideration is the fact that most extraction tools out there are best at dealing with text based information. Now if a significant portion of your content is graphical, like  maps, drawings, graphs, that need to be automatically classified and have data extracted. Text based OCR systems seem not to work very well, so this becomes an important question you'll be asking your sales rep during the selection process.


The same considerations regarding your target intake volumes, volume patterns and expected growth that we looked at in the Scanning Section above will also apply to your Data Extraction. They will drive your selection of manual, manual-assisted of fully automatic extraction tools. In addition, they will also influence your hardware and infrastructure decisions.


Each type of document may have different extraction requirements. You may need to extract different fields, apply different validation rules and store documents differently.
Do you have an easy way to separate these types of documents before they enter scanning or import phases or you'll be relying on your software to classify the documents for you and handle them differently?
Also, will you be able to embed bar codes in your documents to make it easy for your software to classify them - or this option is not available?


Custom logic will let you apply field-specific validation rules, perform database lookups to validate against the list of pre-defined values, treat values as numeric only or alpha only and apply additional business logic that will improve your accuracy and reduce (or eliminate) the manual component of data extraction. Many tools, like Oracle Capture and Oracle Forms Recognition, offer elaborate APIs and allow to perform all of these functions, but other tools may not offer this facility or may not be flexible enough for what you're trying to accomplish.


This is the third largest aspect of any Document Capture Solution and it answers the question of "Once the information is captured, classified and the metadata values are extracted, what happens to it then?" This addresses all of these other aspects of version control, indexing, management and distribution, cleansing, publishing, search and controlled retention.
Now the process of selecting a Document Management System is outside of the scope of this whitepaper, but I'll bring your attention to a few additional capture specific considerations that often get overlooked in the process:


Watermarking, annotation and other image augmentation is not provided by many general purpose CMS but they are provided by Oracle WebCenter Imaging. You'll be able to apply watermarks and annotations and also link them to specific workflow steps. Now with WebCenter Imaging your original image will still be available, which may be important for legal and compliance reasons, and that may not be the case with other image manipulation tools that will in fact be applying watermarks to the actual image.


Retention will let you flag outdated content for deletion or archival, synchronize physical and electronic versions of content and keep track of the physical files, folders and boxes floating around your offices. Retention Management will also help you more effectively share content across organization and ensure it is retained for the period that it should be – regardless of the source or the system that stores it.
You may be required to ensure regulatory compliance (SOX, SEC, industry regulations) by keeping the content for required periods of time and consistently following the processing rules, such review or archival and destruction – when this period is over.
Systematically disposing of certain types of documents, when you’re no longer required to keep them – may prove extremely beneficial in case of potential litigation. You may also choose to freeze some important content, related to legal action you’re about to take – so it doesn’t get “accidentally” destroyed by a disgruntled employee.


Volume and load patterns still play a role in your content management, just as they do in your capture process.


Other Repository Considerations like Security, Conversion and Workflow are all important to consider when selecting new content management system.


And here's the big one. While every situation and every deal is different, the following considerations will be important to consider and will apply in most cases


Some vendors (Kofax being a prime example) will charge you different fees based on the volume of documents you will be scanning and processing. If you're dealing with large volumes or are expecting your intake volume to increase, such solutions may quickly become prohibitively expensive and you may want  to opt out for a solution that offers server-based licensing without the per-user or per-click components, such as Oracle WebCenter Suite.


In many cases Value Added Resellers enjoy special savings and buying form them may prove more cost effective then dealing directly with the vendor.

That's it for today. In the next part of this article we will complete our review of features and considerations of Data Capture tools of 2015

Friday, November 7, 2014

Data Capture Market Of 2015 - Navigating Competitive Landscape - Part 1

These articles will make it easy for you to navigate competitive world of vendors, products and prices of Data Capture Market of 2015.


Data Capture market continues to expand. According to MRI, Automatic Data Capture segment will grow by 12.29%, in the period starting 2012 to 2016 with one of the key factors contributing to this market growth being the increasing need for accuracy in data computation.
Kofax, ECM Captiva, IBM Datacap and KnowledgeLake continue to lead the market in the image processing part and Microsoft, Hyland Software, Open Text and Lexmark' Perceptive Software are named as overall ECM Market Leaders by Gartner as of September 2014.
Below is a copy of Gartner's Magic Quadrant for Enterprise Content Management:


Now if you're considering a new or a replacement data capture solution for yourself or your client - you'll be facing a dilemma - go with one of these market leaders, dig deeper and look for the 'perfect match' or simply try to get something without paying a whole lot. The first solution may save you time in decision making stage and seem like a good choice, as the story goes, "nobody was fired for hiring IBM".
Some clients always try to go with the cheapest solution and many end up in the proverbial "'Penny Wise.." trap when money saved on licensing and support costs are quickly drained by the cost of custom development. Yes, open source data capture tools like Ephesoft do exist and may serve as a sound alternative, but you need to fully understand the strengths and weaknesses of these tools and be making an informed decision.
This whitepaper will quickly take you to the next level of understanding and reap the benefits of a 'perfect match' type of tailored solution. This may come in especially helpful considering the fact of, as Gerald Baker of Encyclopedia of Document Management said,
"We find that suppliers tend to shy away from simple descriptions of what their software can and cannot do - words like "fastest","rich in features", "world's leading" and "multi-function" add little to clarify exactly what the product offers."


Document Capture has many aspects to it, and each of them has its own leaders, effective and not-so-effective approaches and client-specific considerations.
Let's start by looking at the overall Content Lifecycle: (See diagram below)

Picture Credit: "From Unstructured to Strategic: Enterprise Content Management with Oracle WebCenter"

The first three stages represent the key activities of Document Capture - and they represent the key areas of concern:
  • Scanning,
  • Extraction and
  • Image Management.
Taking a deeper look at each of these key components will help you better understand your specific needs and see what solution (or a combination) may offer a better fit.

In the following articles I'll walk you through the specific areas of concern, that will make it easy for you to see the strengths and weaknesses of each particular data capture solution.