Wednesday, March 28, 2012

Oracle Document Capture - the fastest introduction I know about

A few folks have been asking me to do a demo of Oracle Capture and show what it can do and how to use it - so I figured I might as well share it with everybody. This will be fast, but its not one of those BS introductions that gives you marketing stuff and colored block diagrams that just get you feel dizzy and totally messed up. Nope. This one will get you comfy enough so that if you want to go ahead and start using Oracle Capture right away - you will just go ahead... and it will actually work.
So here it comes:
Let’s start by quickly looking at why so many Content Management implementations fail... and also...

What sales people are getting ... right?

You’ve probably heard that sales pitch about controlling the scary bloat of ever expanding mass of unstructured content – paper, email, file shares and how important it is to help your business people to really access the information they need, when they need it – from anywhere. And you know what – that part is right. I agree with the sales guys. Unstructured content really is hard and expensive to manage. I’ve personally seen clients losing hundreds of thousands of dollars having micro-chips manufactured to an outdated specification, ‘cause the production team looked in a wrong folder in an unwieldy SharePoint repository that gone out of control.
So there must be a simple solution, right?

Fast relief to information bloat problem... or not...

Here’s where many good intentions fall flat on a cold hard concrete of reality... the tool alone will not solve your business problem! No matter what tool you pick.
People install Oracle UCM, but unless they capture the metadata – the system is not much better than a plain old file system. Yes, they realize that and they define the metadata they would like to capture, so their check-in screens end up something like this screenshot on the previous page.
Problem solved, Right?
Uh... not exactly.

You see, when a contributor sees a form like that, they get terrified... or annoyed.... or both. And they try to get away with as little input as possible. So they skip as many fields as they can and when the system pops up a “value required” thing at them – they enter garbage.
So is it really practical to collect more than just a handful of metadata fields on check in?
Probably not, if you do it by hand... But who said you ought to do it by hand?

Introducing Oracle Capture

Let’s say, we’re checking in a monthly invoice and have five fields we’d like to capture. How would you like to – instead of putting them on the check in form – let your contributor to look at the actual document, side-by-side with a check in form... and have OCR to take its best guess at pulling that data and populating the form – before the contributor even gets to look at it?
Well, this is exactly what Oracle Capture is designed to do – pretty much out of the box.
Take a look at the screenshot below:

Not only contributor gets to see the document and the metadata fields side-by-side, each field has a corresponding OCR zone defined – and Capture is actually extracting the data out of this zone and puts it into the field.
When she moves to the next field, that field’s zone is highlighted in blue and she gets to see where the value in the field came from.
If the document was skewed or poorly scanned and the zone misses the value, contributor can simply drag and drop the value (See screenshot below)

Just select the area containing the text that you want to see in the field with your right mouse button and see the OCR extract the text out of the area and place in the field for you! How many fields do you think you can capture now?
You can combine also multiple documents into one and you’re free to remove junk pages as you see fit. Imagine what would it take to delete a page from a PDF document during normal Content Server check in. It’s simply not practical so these junk pages end up in repository...wasting time when searching for information... and folks print them every time and waste trees.
All solved in Oracle Capture at a click of a button (See screenshot below)

I hope you’re getting a feel for what it can do. So now it would be a perfect time for some...

Real life stories

I’m going to show you a couple of real life examples, so I want to quickly prove to you that this tool actually works.

Slashed half of the load of overworked Document Control people

One of the projects we recently completed was automating workflow and document control for a nuclear power station operation management provider. As part of doing business, they exchange a ton of technical specifications and formal transmittals with their government clients and sub-suppliers. Prior to implementation, information was duplicated, not easy to organize and retrieve and they didn’t have a structured workflow process.
Before using Capture, average time spent per document was benchmarked at 2 min 17 sec – on a 15 field check in form. With using Capture for extracting verifying and correcting pre-filled values – time spent per document dropped to just over 33 seconds!
That’s over two hours of added productive time (out of estimated 4) per Document Control person per day – or a killer 200% increase in productivity!

One more (75%) productivity increase here

Another our recent project I’d like to mention is the one where we’ve automated document storage, acquisition and workflow for a mid-sized insurance company. They have significant volume of incoming claim-supporting correspondence – mostly scanned documents coming in by email.
Over the years, they’ve tried more than once to extract meaningful metadata fields such as claim number and policy id – on check in – with various degrees of success.
They’ve also relied on a custom, outdated component that automatically checked in incoming email attachments. The component was buggy and unreliable.

Implementing Oracle Capture allowed them to finally retire component and begin to extract important metadata values out of incoming documents – during the check in process. Time spent per-document dropped from an average of 1.5 minutes to just under 22 seconds!
And the unsupported and buggy custom component was Capture Import Server.
I hope you’re getting a feel for the kind of massive, nearly instant savings I’m talking about!
And now let me tell you a more about...

How it works

If you look “under the hood” – at the core of Oracle Capture you’ll find a tool, called Zone Editor, that lets you define the zones where OCR should be looking for metadata values (See screenshot below)

You can create multiple document profiles, each with its own set of fields and its own form layout.
Now how would you get documents into Capture?
There’re several ways. The most obvious one is to use a scanner. Capture is happy to take in the scanned docs and you can also apply a bunch of image enhancements (See screenshot below)

But if your documents are coming in as MS Word and PDF on a shared drive or email attachments - you don’t have to print and re-scan them. Simply use the Import Server that will gladly import anything that can be virtually printed. You can also tell it to watch a folder, and drag and drop documents there – and see them appear in Oracle Capture.
Ok, but what happens when the user clicks “Commit” button on her Capture toolbar? There’re many options, but the one I want to focus on is having the document committed to Oracle UCM (See screenshot below)

It’s easy enough to configure, and you can have a searchable PDF/A committed to the Content Server... in addition to being able to map your Capture metadata fields to Content Server.
And now here’s a feature so profound, I had to put it into its own section:

Rapid customization facility

If you need to apply business rules on Content Server check-in - you can use profiles and rules, but they are quite limited. Custom Java components take a long time to write, plus you risk having to re-write them after every version upgrade...
Good news! Capture offers you quick, powerful and easy to use alternative – Basic Macros. Just like in MS Word, Excel and Outlook, you can verify and alter data in the fields, handle events like page load (and Capture’ own commit and OCR-related events), populate dropdown values and display custom dialogs.
Another thing you can do is go back to the database (a Content Server one if you like) and bring back a bunch of metadata values based on just one field. If you do that – all you need to grab off the scanned image is just one field! Claim number brings back the Broker Name, Adjuster, Policy Number and so on.

And once we started to talk about writing macros in Oracle Capture – there’s something else I have to mention...

Don’t say I haven’t told you!

The first time we used Capture a few years ago – we quickly flipped through developer manuals and it only took us 15 minutes to install the tool itself , so went for a beer to celebrate! What an easy tool! Phew! We felt like we’ve almost completed the project. Same time next week.... Bitter disappointment! Nothing works! Can’t even create an object! Assigning values to properties.... no result! Enumerating collections.... collection is empty!
EMPTY? But I’m looking at the damn batch! It has 10 pages! How can the pages collection be empty?
Pounding on keyboards... Screaming... A few days of fun!
But you know what, it all started to work after we’ve learned a few tricks. Macros do seem really simple at first, and they are, they do take a little time and experimentation to get used to and get them to work. So here’re the few tips I wish we had in front of us back then.
So if you are planning on writing macros in Oracle Capture (or you've tried and are facing these kind of issues) - the best way to get you over the hump is to take advantage of my free consultation offer. Check it out at - its totally free and you are under no obligation whatsoever. We just get on the phone and I answer your questions... Check it out.
And now that we’ve looked at these and the core features of Oracle Capture, let’s see what it takes to really crash it on results on the corporate level. So here’re the...

3 keys to achieving maximum benefits with Oracle Capture

This section will give you some important pointers on bridging the gap between a really helpful tool and massive enterprise-wide business results. I’m giving you 3 critical keys to a successful corporate rollout of Oracle Capture. Ignore them at your own risk.

Key #1 - It won’t work in isolation

If one of the departments starts using Capture and correctly populate a bunch of metadata fields... and other departments continue to grudgingly punch in required values by hand – the good and the bad entries will all end up in the same repository and you won’t be able to tell the good from the bad.
If you want to see some real results – you may be looking at an enterprise-wide rollout and, possibly, some data cleanup work down the road.

Key #2 - The more you use – the more you gain

Yes, it does sound common sense, but it’s true. The more of your content you have in electronic form, available for searching – the greater the power you give your business people. The more of the relevant metadata fields you define – the easier it is to search, manage and orchestrate your processes.
Once all of your content is in – you can also control its retention. Delete or archive what’s not being used. Eliminate multiple versions of the same document....
This is why companies undertake back file conversions, scan and eliminate paper copies, get rid of their mail rooms and off-site storage. If it takes a day or two to obtain a paper copy – what’s the benefit of having that copy in a first place? Sometimes, spending even 5 minutes searching for it will be posing a risk of making a wrong business decision or missing out on important opportunity.

Key #3 - One size... won’t fit all

Here’s the last one, but by far not the least. To really get what the Capture has to offer - the true benefits, like the smooth, tailored user experience that takes you to the next level of productivity – you got to customize the product ... Add business logic... clean up your OCR data... make your metadata fields “smart” – to handle partial matches, pop up custom dialogs, link the fields, capitalize on database lookups... An extra day of development may well bring you hours worth of saved contributor effort – day in and day out.
Beyond the turbo-charged check-in
That’s great, but what if you don’t want someone to sit there and verify every document? What if you’re looking at a back file conversion of thousands of documents – and you want to have it run unattended?
You really have two options – use bar codes and Oracle Capture Recognition Server or step up to Oracle Forms Recognition
If you can afford to use database lookup and macro code to infer the values of your metadata fields from one or two key fields, the Recognition Server that comes with Oracle Capture will be an ideal solution. Simply add a bar code to your document – to represent the key value, such as your policy or file number - and have a database lookup pull out the rest. That gives you fully automated forms recognition. No human intervention required.
If you need to process a variety of forms and you want the system to decide what form it is – and pull out the correct metadata values – you’re looking at a totally different animal and you got to use a different tool. This tool called Oracle Forms Recognition or OFR.
While Capture is based on a set of user-selected document profiles, each with a fixed OCR zones, OFR comes with intelligent full-form OCR and automatic matching to a document profile. That said, its more expensive and more work to setup and “train”. So once again, one size does not fit all. Pick your tools to match your project.


You’ve just learnt about a killer tool that can quickly put controls around a growing mass of your corporation’s unstructured content and take your business people to the next level of productivity. You’ve seen how Oracle Document Capture slashes manual input with its simple, intuitive and flexible interface... and you’ve seen the actual numbers that come with the massive time savings and productivity gains this tool produces in real life settings.
I’ve shown you what tools you have to customize Oracle Capture to fit to your own requirements and I’ve also given you some insider tips and things you need to be aware of when writing Basic code in Oracle Capture.
You’ve also seen the three keys to achieving the maximum benefits with Oracle Capture and what options do you have when going beyond a “turbo-charged check in screen” solution.Now once again, if you are a Web Center Content (Oracle UCM) client or a System Integrator, You can get me on the phone and get your answers... FREE. This is part of my community work so no sales talk and so strings attached.
That's all I got for today. Stay tuned for more...


  1. Hi Dmitri,

    I have a question, I am doing a proposal for my client and in that they have one requirement as:
    - Can we capture/scan a portion of a page, not the complete page.

    Thank you.


  2. Did you try playing with the filters in Capture? Like the Remove Border one?

  3. Hi Dmitri,

    Very usefull article.

    btw, How to get dDocName or dID after BatchPostCommit Event ?

    Thank You.

  4. Hi Dmitri,

    Very usefull article.

    btw, How to get dDocName or dID after BatchPostCommit Event ?

    Thank You.

  5. Can't think of anything on the other hand. Please check the forums. I imagine there's a really obvious and simple way of doing. it. Mayb ask Oracle... As a fallback, you could always run a query or a Content Server service from your VB code and get it that way.

  6. Hi Dmitri,

    Very nice article. Thank you.

    Is it possible to make a copy of the page when it is being deleted using macros? I see that there is a page delete event, but not sure how to write the macro to make a copy of that page before deleting it.