Extracting key data elements i.e. metadata from contracts is not an easy chore.  Done manually, you would need to open each and every contract (which in itself is a time-consuming task), search through the full document for many different types of language, and copy and paste it into a tool like Excel.

What is Metadata?

Metadata is the important data from contracts or underlying documents, it can be further described as the content by a set of fields and values so that one can get structured data from a contract that can be stored for various important records.

Different documents consist of varied information that can be tracked or stored for reporting and review. The importance of these elements varies depending on each type of contract.

Below are the most common meta-data attributes that are generally extracted along with the other contract-specific attributes:

Metadata attributes

Time studies have shown that in order to extract any given attribute, especially since most often, there will be agreements as well as associated companion documents, it takes about 2 minutes per attribute.  If you extract, 30 metadata elements, it would take a half-hour for each contract.  If you have 10,000 contracts, it will take 5,000 person-hours.  Then factor in quality control, checks, document organization, OCR software, etc., and other processes that need to be in place pre-and post-extraction.  Totally, such an effort for 10,000 documents can take about 8,000 person-hours.

Taking it further, and using 7 hours per day productivity time for a person, and allowing 19 days per month (exclude vacations and holidays), the effort for 10,000 documents comes to 1 year for 5 people!  Ouch!

There are three ways to extract information from contracts.

  1. Fully manual process – the time frame of which I have outlined above.  Cons: Tell 5 people to do an abstraction, and due to the complexity of legal language, you will get 5 different answers! Manual abstraction is very much prone to human errors.
  2. Fully automated – if you use extraction software, there is still a large amount of quality control that needs to be done because no software can decipher all the nuances of legal language.  Not to mention, installation, configuration, training, and maintenance of the software.  And who will do the q/c? Most companies going this route find that the extraction is not as high a quality as they had hoped, and there is still a large manual level of work to be performed.
  3. Hybrid.  Companies use software-powered extraction and take ownership of human quality control to deliver complete, high-quality data to their clients. It’s what we call “technology-enabled service”.  A single vendor delivering guaranteed results has the advantage of  “one throat to choke”.

Consider the effort required to do the extraction before you embark on the journey.