Tuesday, June 24, 2014

Archiving In Webcenter Content - A Deeper Look - Part 2

Welcome back! Now that we "took a step back" and re-examined our understanding and our strategies, let's look at the tools available to us in WebCenter Content.


Just as a name suggest, this is the tool designed to 'Archive' your content. But what does it mean? In this specific case of Archiver, it allows you to:
  • Remove large amounts of content from repository and compress it for backup storage or manual deletion
  • Transfer content to another Content Server instance in manual or automated fashion
  • Perform mass updates of metadata values, which may come very handy in moving content in another 'bucket' in your repository or tagging it.
You start by defining a query to select content you're going to move (See screenshot below)

And then you can define simple transformations like field and value mapping and setup manual or automatic replication to other Content Server instances (See screenshot below)

Expired Content

Expired Content provides an effective mechanism for taking automatically content out of the repository on a specific date. You could simply specify a value for Expiration Date during Check In or metadata update and when that date arrives, your content will no longer come up in a search. (See screenshot below)

You could then find expired content using Expired Content page (See screenshot below)

Please keep in mind that Content Expiry mechanism is rather radical solution because following the Expiration Date these items will not show up in searches and links to the item's web location will no longer work.

Retention Queries in Framework Folders

New Content Server Framework Folders component replaces the older Contribution Folders also known as Folders_g. They introduce a number of additional features like removal of the 1000 items per folder limitation, better performance and new Retention Folders feature. (See screenshot below)

As the name suggests, it lets you define a query and apply simple retention policy on its results - how many revisions of these content items you'd like to keep and for how long (See screenshot below)

One thing to keep in mind is that if you have Records component enabled and a content item is assigned to a Retention Category in Records - that retention will override the period you specified in a Retention Query - and longer retention period will be used.

WebCenter Records

WebCenter Records is the ultimate retention management tool with tons of flexible features to define complex rules, time periods and disposition actions, that can match virtually every organization's Records Policies. It will also let you keep track of your physical assets, perform complex audits and much more, too much to describe here.

Records Adapters

WebCenter Records also offers Adapters - software components that allow you to extend your retention policies over the content that is not physically stored in WebCenter Records. You can apply retention to content stored on your File System, Documentum, FileNet and MS SharePoint. WebCenter Adapters obtain all policies from Records server, that manages the retention and apply them to content stored in the corresponding system. The Adapter also sends information back to the Records server so it can maintain a complete and up-to-date catalogue of the enterprise's important Content. The screenshot below shows results of Federated Search across the Content Server and linked repositories: (See screenshot below)

This lets you apply your retention policies to content more consistently with less administrative effort and less disruption for users – without the need to migrate content out of your existing systems across the enterprise.

For instance, you can freeze items in SharePoint and apply retention by search criteria and item locations.

Common Problem Areas

Now that we looked at all aspects and various approaches to archiving, let's take a look at common problem areas that often make organizations go with less than optimal retention policies and archival strategies

Your Physical Assets

Many organizations are still forced to manage large amounts of paper documents and other physical assets. These assets are not as easily accessible and manageable as their electronic counterparts - and that many times results in artefacts being stored longer then needed, lost items and other similar issues. It is often beneficial to review and eliminate your physical assets that approached their end of useful life or consider scanning them and retaining as electronic copies.

Collaboration Tools

Collaboration Tools like SharePoint and Wiki almost always result in large in-flow of user-generated content - that is not very well structured and ages quickly. Most organizations don't have effective retention policies in place to manage this type of content - to eliminate outdated items and keep content easy to find and relevant. WebCenter Records' SharePoint Adapter may come in handy here as it allows you to define and apply retention policies to your SharePoint repositories.

'Smart' content expiry

As I mentioned earlier, using Expired Content feature with web content, such as your public sites and intranets, may result in broken links as expiring items are taken off the repository. You can plan for this by running an 'Expiring Content' query ahead of time and update the links that would otherwise get broken, or you could opt out for using other mechanisms of archiving such as metadata updates.

On the other hand, routinely running a broken links report is another good practice to implement.

Performance Issues

Many times when you system slows down, the first complain heard form sysadmin is "we have too much content". Sometimes it’s true but in most cases system slows down for two other reasons then approaching its maximum capacity.

Assessing Performance

Here're a couple of number for you to use a guideline. By no means your system must match those exactly as every configuration is different, so these are just some ballpark figures for you to see where you’re at.

Read-only requests - like the page loads - should pump out about 20 requests per second per GHz of CPU times the number of CPUs. Dual 2Hz box should pump up 80 pages per sec.

For raw requests such as content information or calls to the GET_FILE or Check In (for small files, say, around 200K) you should target 4 requests per GHz of CPU times number of CPUs in the system.

Once again, these are just a guideline and the numbers greatly depend on file size and many other factors.

Oracle WebCenter Content is an I/O limited application, which means that overall performance is mostly restricted by the speed of your hard disks, not the software itself or the number of items you have in your repository.


In this short series we revisited various practical meanings of the word 'Archiving' as it applies to Content Management in general and specific features of Oracle Web Center. We looked at the common problem for you to keep an eye on and the set of tools that WebCenter Content offers for implementing your content retention and archiving policies.

No comments:

Post a Comment