What is abstraction/extraction and why do I need it?

Extraction is the process of pulling out relevant data points from contractual documents.  Once these are extracted and labeled accordingly, they can be put into a structured database (even Excel) for further reporting and analysis.

For example, you can extract Entity name, Counterparty, Expiration date, Autorenewal (Y/N), Workers Comp insurance liability limit (currency), etc.

Reports on these can be done once put into a structured database.  For Example, “Give me the names of the Counter Party which have a Workers Comp liability limit of less than $1,000,000, etc.

Why do I need extraction?

There are multiple reasons for extraction.  At a 60,000-foot level, you need to know easily what is in your contracts.  Having the relevant information at your fingertips through reports in a structured database gets you all the information without you having to open each and every document to determine which meets the profile of your query.  Some of the reasons for extraction are:

  • Information – Much knowledge of the old contracts. This is lost if it is not in a new CLM
  • Reporting – can do triggers of contracts coming up for renewal, supplier contracts with penalty clauses, etc.
  • Adoption – CLM system is adopted if all the current and older contracts are in the new system along with the relevant data. (Important to note that it is not simply a document repository)
  • Revenue recovery – police penalty clauses, etc.
  • Compliance – track items that would be affected by changes in regulation

What is the difference between extraction and abstraction?

Even though there’s a difference in the dictionary meaning we use these two terms synonymously. As per the dictionary –  extraction means the exact phrase from the contract/document is picked and copied whereas abstraction is the iota of extracted data.

For example, for the expiration date attribute, if the contract says “This contract expires on 3/4/19”, the extraction value for this would be “This contract expires on 3/4/19” whereas the abstraction value will be “3/4/19”, removing the irrelevant part from the phrase.

What is the difference between structured, semi-structured and unstructured data?

Data can be defined as a set of qualitative and quantitative values, that can be further processed to derive meaningful information. The quality of the information derived from a set of data depends on the accuracy and completeness of the data. It can further be divided into three types:

  1. Structured data – Data that has consistent formats and can be easily organized into a database like facts and figures entered into a pre-defined format or well drafted reports.
  2. Semi-structured data – The information that doesn’t reside in a pre-defined data format but does have some organizational properties that makes it easier to analyse, for example excel reports that do not conform to a formal structure.
  3. Unstructured data –Refers to information that either does not have a pre-defined data format and/or it is not organized using a common layout. Examples of unstructured data can be word docs, PDFs, video files, presentations, emails etc.

What are the different classifications of data?

Broadly data can be classified into five categories:

  • Text – It includes script characters and expressions (words and sentences), like text written in different languages
  • Numeric – Numeric values consist of numbers, decimal, percentage values, etc.
  • Currency – This data type consists of monetary values like $ (US Dollar), € (Euro), ¥ (Yen), etc.
  • Date & Time – It includes value for the years and time written in different formats like AM/PM, for dates 20th June 1990, 20/06/1990 (dd/mm/yyyy), 06/20/1990 (mm/dd/yyyy).
  • Boolean Expressions – it indicates true/false, yes/no or on/off values.

What is legacy data and why it is important to track it?

Legacy data is data from your legacy or your executed contracts. These are contracts that have been executed by both parties. They may be current and still relevant or may have expired. It is important to track legacy data for the following reasons:

  • Information – Much knowledge is stored in the old contracts which may be applicable to future business decisions. This is lost if it is not in a CLM.
  • Reporting – can run triggers on contracts coming up for renewal, or supplier contracts with penalty clauses, etc.
  • Adoption – CLM system is adopted if all the current and older contracts are in the new system
  • Revenue recovery – police penalty clauses, etc.
  • Compliance – track items that would be affected by changes in regulation

To know more read our whitepaper on 7 Reasons to Load Legacy Contracts

What is a Contract Lifecycle Management (CLM) system?

Contract lifecycle management is a system to manage contracts within an organization. With a single unified repository to track contracts better, one can be benefited from the reports generated through these systems helping businesses to make better-informed decisions.

Most of contract lifecycle management systems provide contract authoring, contract creation and approval, workflow management, digital signatures, and other related services.

As contracts are a written set of guidelines and commitments to undertake a task or deliver a product or service, failure to comply with these commitments can attract legal or regulatory penalties. Effective contract management is as necessary as entering into it.

What are some of the benefits of a CLM?

Key benefits of a CLM are:

  • Helps in contract authorization and creation
  • A good utility to manage workflow of contract management
  • It is a single and unified repository
  • Provides insight into key provisions at fingertips
  • Helps in compliance monitoring and follow-up
  • Helps in identifying and mitigating risks
  • Can generate useful reports for better informed decisions
  • Helps in management reporting
  • Access to all information in one place from multiple concurrent locations

Why should I migrate legacy contracts into a CLM?

Contract migration into a CLM is the way that yields benefits from having a single and unified repository to identify & mitigate the risks. Contracts carry an enormous amount of information, knowing what has been committed in past can help take better future business decisions. Migrating legacy contracts into a CLM leads to a structured database that can help in identifying hidden or forgotten information which may result in additional revenue recognition or obligations that could have caused heavy penalties to the organization. Also, it is an effective measure to track any compliance or regulatory changes.

Migration can further be segregated into two categories:

  • Document migration – It is the process of uploading the scanned or OCRed copies of your contracts onto the CLM
  • Meta-data extraction and migration – Hereby, key data points are also known as meta-data from the contracts and are extracted and uploaded onto the CLM for review and reporting.

What is the full process to loading legacy contracts into a CLM?

It is a four steps process:

Step 1 – Scan all the paper documents into a computer base file

Step 2 – Convert the scanned documents into the text from the images (OCR)

Step 3 – Extract data points from these contracts

Step 4 – Ingest documents and data points into the CLM

To learn more about the process, download Brightleaf’s whitepaper on Legacy Contract Management.

How do I know which attributes to extract and upload into a CLM?

Contracts being a legal document carries lot of information, but not every bit is required to be tracked and monitored. It is important to understand what to be extracted and what not to be, as the metadata attribute extraction requirements can vary from industry to industry. One can follow below guidelines to identify which attributes to extract:

  • Divide the contractual documents into different types (NDAs, Procurement agreements, SLAs, etc.)
  • Work with each of the departments that touch the contract types to determine which elements are important to them and need to be extracted.
  • Look at extraction from a reporting standpoint – what reports do you want to run on the extracted data.
  • Determine the frequency of the data points that need to be reported. For example, if the only reason you want to extract the jurisdiction state would be if something goes wrong with the contract, you may not want to extract it since it may only be required for 1% of the contracts through a year.

A cautionary note – if you think full clauses need to be extracted, first determine “why?”. You can report on the full clause, but it may be more beneficial to break down the clause into different data points: For example, you may think to extract the Indemnification clause.  However, it may be better to break that down to Indemnification for breach of confidential information for receiving party (Y/N) etc.

What are some of the standard Attributes?

SL No. Attribute Name Attribute Type Data Type
1 Name/Title of Agreement Standard Attribute Text
2 Type of Agreement Standard Attribute Text
3 Contract Number Standard Attribute Alpha Numeric
4 Effective Date Standard Attribute Date
5 Expiration Date Standard Attribute Date
6 Renewal Option Standard Attribute Yes/No
7 Initial Term Standard Attribute Numeric (Days or months)
8 Governing Law Standard Attribute Text
10 Delivery term Business Specific Text
11 Freight on Board Charges (incurred by) Business Specific Text
12 Consumer Price Index Adjustment Date Business Specific Date
13 Most Favoured Customer Business Specific Text
14 Currency Conversion Rate Business Specific Alpha Numeric
15 HIPPA Complied Business Specific Yes/No

Why is Six-sigma level abstraction quality important?

Contracts carry crucial information with legal and regulatory bindings any missing or incorrect information can trigger penalties and other legal consequences. Failing in capturing a payment date can attract more interest on payment or even a termination or an incorrect renewal date can result in a loss of revenue. Solely software solutions, though really fast, can promise a quality up to 75-85% only, so it becomes inevitable to have a human intervention to get the maximum level of quality. Read Brightleaf’s whitepaper to understand human intervention in contract review and abstraction.

When some one says “Review of extracted results”, what does this mean?

Typically, a software will do the first level of extraction of data-points from the contracts.  Then a team checks the results.  “Review” has connotations of spot-checking.  That will NOT lead to any accurate results of extraction.

Brightleaf’s stringent process which is embedded in the software AND the lawyers who check the output, makes them verify EACH-AND-EVERY data element against the original document.  This is the ONLY way to get highly accurate results.

So ask the vendor – “Do you check every element, or does “Review” mean spot-checking?  Can you provide an audit of every element checked by the lawyer with time and date stamps”?

What are different ways to abstract data?

  1. Manual abstraction – Interpretation and abstraction of important information done by humans. It requires planning, oversight, and lots of error checking.
  2. Automated abstraction – Abstraction of data purely done by software based on some pre-defined rules and algorithms. Fast but prone to errors.
  3. Hybrid abstraction – Best of both, a combination of manual and automated abstraction. Here the data is abstracted using software making the process fast, then vetted by experts for completeness and accuracy.

To understand the difference, download our whitepaper on Legacy Contract Management.

How do I visualize this in a CLM?

Visualization of data is done through reports generated by a CLM system. Your CLM provider can help you with a sample report which will assist you identifying what additional data you might need or something you might have already got extracted is not of much use so, it can be dropped from the final extraction.

You can follow the below process to get the sample reports:

  • Once you have determined the data elements that need to be extracted and uploaded, ask your CLM vendor to provide sample extractions for each contract type.
  • Ensure the data is how you would interpret each data point
  • Ensure that the test data set is uploaded to the test instance and you are receiving the benefit from the reports that can be generated on the data points extracted
  • Run the extracted data through the CLM in small batches. This way you are able to see what is being uploaded/recorded properly and what may need to be changed.

Do not do a “1 and done” upload, as this may result in unforeseen errors with no way of fixing them.

Can I extract and upload pricing tables from my sell or buy side contracts?

  • Yes, but this requires a highly thoughtful process
  • Tables are not always consistent across agreements.

For Example: on the buy side, there is no consistency on the columns, SKUs, quantity, unit pricing, pricing per year, escalations per year

  • This makes “normalization” of the data i.e. defining fields that go across all contracts, an intensive task.
  • Always look at it from a reporting standpoint. How would you want the data points shown and how do you want to report on them?

Can I license software that would do all the work?

Yes, but…..

  • The software can do a “bulk of the heavy lifting”
  • But no software will be perfect! (no matter what they guarantee, there will always be an error in the software)

There are also hidden additional costs for which you must account: Software training, additional modules to handle Third Party Paper, configurations, lawyers or interns to quality control the resulted outputs, etc.

What should I look for in a vendor who will do the extraction?
  • Do they have software
  • Do they have their own software
  • Do they have their own people
  • Are their people lawyers?  Not interns
  • Do they q/c each and every extracted element?  Or spot check
  • Can they guarantee Six Sigma levels of accuracy, a must when you are dealing with critical contractual data?
  • If they provide more than extraction services, how do they choose the extraction team?  Is it whosoever is on the bench?

Why is ISO 27001 information security certification so important?

  • Your contracts are the most confidential part of your business
  • Some vendors use a network of part-time workers to process your contracts.  This creates a security risk for your most sensitive data
  • ISO 27001 is a location-based information security certification standard. You need to insist that the vendor have that certification, and ask for all the controls in place

What is Brightleaf’s value proposition

  • We have our own NLP/AI software.
  • We have our own team of lawyers
  • We are highly focused company, just performing the task of extraction
  • We have a stringent Six-sigma process focused on extraction
  • We control people, process and technology dimensions
  • We guarantee up to Six Sigma level of extraction quality
  • We q/c each and every element against the original document
  • Our team of q/c lawyers are solely trained on q/c
  • Our q/c team of lawyers, sit next to the configures of the software who are next to the software developers.  This gives us the Six Sigma quality.

Why is checking EACH and EVERY data point extracted by the software so important? Why can it not be spot – checked?

When you are dealing with legacy documents for migration into a CLM system, these documents have many issues – missing dates, handwritten attributes, typos, bad document scans, mis-filings of masters, and the matchup of addendums, etc.  The software cannot solve all the problems.  Since it is unknown, the only way to get dependable accuracy of data, a must when you are running your business on contracts is to verify EACH and EVERY data element that is extracted against the original contract.

Since Brightleaf has its own software, people and process, why would someone quote a price which is more competitive?

The questions that you ask are:

  • Do you have your own software (get a demonstration)
  • Do you check EACH and EVERY data point against the original contract that is extracted by the software?
  • Does the software have the ability to track and see if the people have checked EACH and EVERY datapoint? (ask for a demonstration)
  • Is the team that is doing the verification of every data point comprised of lawyers (not interns)?

Companies will short-change the verification process and will “spot-check” the extracted output.  This is cutting corners in a major way.  And gives rise to low-quality extraction – oftentimes, less than 70%.

Having 30 errors in 100 is very poor quality data.  What good is any analytics done on the data when it is inaccurate?

Can I extract and upload some of my business specific attributes?

  • Yes, Brightleaf has full control over its software allowing full flexibility in the extraction process.
  • Any CLM system can be programmed to ingest any type of business specific data.
  • You can even determine how each of the attributes are tracked/extracted:

For Example, Start date: If there is no explicit Start date in the contract, do you wish to extract it as the Effective date?  Or the Date of Signature?

How can Brightleaf help?

Brightleaf provides a hybrid solution to handle all your abstraction needs. Brightleaf with the help of its own proprietary technology and a team of legal experts offers a six-sigma level quality output.

  • Offers a custom-tailored service to clients
  • Enable you to keep your ERP or contract management system up-to-date.
  • Helps you to ensure compliance with existing contracts.
  • Empower you to gain strategic insights into overall legal obligations, risks, and opportunities.

Save time and money hiring expensive abstraction firms.