Tuesday, June 24, 2014

Archiving In Webcenter Content - A Deeper Look - Part 2

Welcome back! Now that we "took a step back" and re-examined our understanding and our strategies, let's look at the tools available to us in WebCenter Content.


Just as a name suggest, this is the tool designed to 'Archive' your content. But what does it mean? In this specific case of Archiver, it allows you to:
  • Remove large amounts of content from repository and compress it for backup storage or manual deletion
  • Transfer content to another Content Server instance in manual or automated fashion
  • Perform mass updates of metadata values, which may come very handy in moving content in another 'bucket' in your repository or tagging it.
You start by defining a query to select content you're going to move (See screenshot below)

And then you can define simple transformations like field and value mapping and setup manual or automatic replication to other Content Server instances (See screenshot below)

Expired Content

Expired Content provides an effective mechanism for taking automatically content out of the repository on a specific date. You could simply specify a value for Expiration Date during Check In or metadata update and when that date arrives, your content will no longer come up in a search. (See screenshot below)

You could then find expired content using Expired Content page (See screenshot below)

Please keep in mind that Content Expiry mechanism is rather radical solution because following the Expiration Date these items will not show up in searches and links to the item's web location will no longer work.

Retention Queries in Framework Folders

New Content Server Framework Folders component replaces the older Contribution Folders also known as Folders_g. They introduce a number of additional features like removal of the 1000 items per folder limitation, better performance and new Retention Folders feature. (See screenshot below)

As the name suggests, it lets you define a query and apply simple retention policy on its results - how many revisions of these content items you'd like to keep and for how long (See screenshot below)

One thing to keep in mind is that if you have Records component enabled and a content item is assigned to a Retention Category in Records - that retention will override the period you specified in a Retention Query - and longer retention period will be used.

WebCenter Records

WebCenter Records is the ultimate retention management tool with tons of flexible features to define complex rules, time periods and disposition actions, that can match virtually every organization's Records Policies. It will also let you keep track of your physical assets, perform complex audits and much more, too much to describe here.

Records Adapters

WebCenter Records also offers Adapters - software components that allow you to extend your retention policies over the content that is not physically stored in WebCenter Records. You can apply retention to content stored on your File System, Documentum, FileNet and MS SharePoint. WebCenter Adapters obtain all policies from Records server, that manages the retention and apply them to content stored in the corresponding system. The Adapter also sends information back to the Records server so it can maintain a complete and up-to-date catalogue of the enterprise's important Content. The screenshot below shows results of Federated Search across the Content Server and linked repositories: (See screenshot below)

This lets you apply your retention policies to content more consistently with less administrative effort and less disruption for users – without the need to migrate content out of your existing systems across the enterprise.

For instance, you can freeze items in SharePoint and apply retention by search criteria and item locations.

Common Problem Areas

Now that we looked at all aspects and various approaches to archiving, let's take a look at common problem areas that often make organizations go with less than optimal retention policies and archival strategies

Your Physical Assets

Many organizations are still forced to manage large amounts of paper documents and other physical assets. These assets are not as easily accessible and manageable as their electronic counterparts - and that many times results in artefacts being stored longer then needed, lost items and other similar issues. It is often beneficial to review and eliminate your physical assets that approached their end of useful life or consider scanning them and retaining as electronic copies.

Collaboration Tools

Collaboration Tools like SharePoint and Wiki almost always result in large in-flow of user-generated content - that is not very well structured and ages quickly. Most organizations don't have effective retention policies in place to manage this type of content - to eliminate outdated items and keep content easy to find and relevant. WebCenter Records' SharePoint Adapter may come in handy here as it allows you to define and apply retention policies to your SharePoint repositories.

'Smart' content expiry

As I mentioned earlier, using Expired Content feature with web content, such as your public sites and intranets, may result in broken links as expiring items are taken off the repository. You can plan for this by running an 'Expiring Content' query ahead of time and update the links that would otherwise get broken, or you could opt out for using other mechanisms of archiving such as metadata updates.

On the other hand, routinely running a broken links report is another good practice to implement.

Performance Issues

Many times when you system slows down, the first complain heard form sysadmin is "we have too much content". Sometimes it’s true but in most cases system slows down for two other reasons then approaching its maximum capacity.

Assessing Performance

Here're a couple of number for you to use a guideline. By no means your system must match those exactly as every configuration is different, so these are just some ballpark figures for you to see where you’re at.

Read-only requests - like the page loads - should pump out about 20 requests per second per GHz of CPU times the number of CPUs. Dual 2Hz box should pump up 80 pages per sec.

For raw requests such as content information or calls to the GET_FILE or Check In (for small files, say, around 200K) you should target 4 requests per GHz of CPU times number of CPUs in the system.

Once again, these are just a guideline and the numbers greatly depend on file size and many other factors.

Oracle WebCenter Content is an I/O limited application, which means that overall performance is mostly restricted by the speed of your hard disks, not the software itself or the number of items you have in your repository.


In this short series we revisited various practical meanings of the word 'Archiving' as it applies to Content Management in general and specific features of Oracle Web Center. We looked at the common problem for you to keep an eye on and the set of tools that WebCenter Content offers for implementing your content retention and archiving policies.

Tuesday, June 3, 2014

Archiving In Webcenter Content - A Deeper Look

This article develops understanding of practical meaning of the word 'Archiving' as it applies to Content Management in general, practical business applications and ROI of your specific Web Center implementation. Participants will review different aspects of archiving - obsolete, hidden, cut off, deleted, information architecture and physical design of archiving systems as they consider a compromise between the cost of storage and speed of retrieval


WebCenter Content offers a great set of tools we can use for archiving. It gives us Retention Management, Content Expiry, Replication and many other features and tools, but are we using them efficiently? And what does Archiving mean after all?

Back in September Deane Barker of Blend Interactive brought up a controversial topic in Content Management Professionals Group. He asked:

"'Archiving' is a popular word in content management, but what does it mean? Is there an accepted definition?"

Most of us remember the days when 'Integrated Document Archive and Retrieval Systems' or IDARS was a large market segment in its own right. Tape archives and off-site storage were there first things that came to mind. But what about the 'News Archive' section of our corporate intranet? Should it be even called an 'Archive'? And what about an Email Archive? Now what should we do with historic account statements for our customers?

All valid questions. The questions I suggest we should answer before looking at the wealth of tools that are available to us as part of our Oracle WebCenter Content license.

So may I suggest that you pause for a quick second and ask yourself this important question - the question that Deane has asked the crowd of just over 27,000 information management professionals back in September:

What does 'Archiving' really mean?

Which one of these options comes close to your organization's definition of 'Archiving'?

  1. Does it mean that you simply move your content to 'Archive' section, just as so many web sites now do with their 'News Archive' section? The content is still visible to the public, but simply moved to another section?
  2. Does it mean that we restrict permissions to hide the content from public, mark it as 'archived' but still leave it in its current location to stay accessible by content contributors?
  3. Or does it meant that we move the content to another location, still viewable by contributors but not publicly accessible and not in its original location?
  4. Or maybe it means that you move it to some backup medium and completely remove it from your WebCenter instance?
  5. Or does it simply mean "deletion of outdated content"?

Pondering this will help you get the most value - not just out of the tools that come with WebCenter, but infinitely more importantly - out of the business content that your WebCenter Instance is tasked with managing in the first place! So pause now and please do your thinking before continuing to read.

Archiving as means of addressing system limitations

Back in the late 1990s on we had little control over how CMS systems displayed content, so we were forced to 'archive' stuff to prevent it from showing up on the site. These days are long gone but even now we're still facing that type of limitations.

Here's one scenario that still forces people to 'archive' content stored in Contribution Folders - the notorious limit of 1000 content items and/or child folders per contribution folder. If your system uses Contribution Folders (as opposed to newer Framework Folders) - Folgers_g actually throws csCollectionContentMaxed exception when trying to add more than 1000 items to a folder.

Now, few people are aware of this, but you can actually solve this by increasing the limit by using configuration variables. So that prompts another question:

If there's no foreseeable limit on the number of items you can easily store in repository and you can simply change permissions to remove outdated content from public view - do you still need to 'archive' anything?

With the cost of storage going down year after year and ever faster searching tools - the pressure to take content out of repository just to improve performance has largely been solved. We seeing this trend all around us too. Gmail has increased its free mailbox storage space from 1Gb to 15Gb in less than 10 years so you hardly ever are forced to delete emails. 'Soft Delete' or 'Archive' as Gmail calls is what users now use instead of delete.

'Soft Delete' features

'Soft Delete' effectively removes content from the active view in the system, but it can always be accessed by search.

And that creates another problem! The problem that comes from proliferation of irrelevant content.

Relevant vs. Irrelevant content

Ability to find or restore anything is great but having multiple (including outdated) versions of important technical specs may get in the way of effective communication. The famous out of control SharePoint repositories are the first example that comes to mind.

You don't have to remove content, but that doesn't mean that you don't need to systematically identify content that is now less relevant. If the content is less relevant it could be tagged or moved to an archive location. If it isn't relevant at all, it should be deleted or sent down the pipe of your disposition process.

A discontinued product that no longer being sold doesn't need to show on the web site. Archiving keeps the information available to content managers in case of any legal/ compliance issues that may arise. And potential litigation is another great argument to get rid of the content that you are no longer obligated to keep.

Defining your governance

Now every organization is different and so is the decision making process when it comes to defining the life cycle of your content. Whether it is a single person or a cross-functional content steering committee that makes your governance decisions - your Information Management Team needs to have clear directions and a well defined life cycle for each type of content.

Applying Retention Policies

You need to define what types of content your system is managing and what 'archive' and 'delete' means for each type - and how it is carried out. Your corporate retention policy is key, and your legal colleagues should be part of the discussion.

With that in mind we should be ready to look at the tools available in WebCenter Content to make it easy for you to apply your content life cycle decisions

That's it for now. In the next article we will look at the set of tools available in WebCenter Content and how to best use them to implement our archiving strategies.