YouTube Icon

Document Understanding – A Doorway To Efficient Systems

Document Understanding – A Doorway To Efficient Systems

A 60-year-old patient walks into a new medical facility. He carries with him a big pile of printed papers of health records containing his reports from all his previous medical providers. The operator at the facility starts entering patient data by physically inspecting the last few entries from his records.

A borrower submits a home loan application. The mortgage provider asks to submit financial documents, and property details. The documents are scanned and sent to the offshore facility for manual data entry. The offshore operator starts a long and painful process of inspecting and entering financial data from the document into spreadsheets. The structured data is returned to the provider the next day and then the further process continues. A similar manual process is used for contract review, legal case precedence, resume processing, invoices consolidations, expense reconciliation, etc. 

Every day millions of documents with unstructured are generated, inspected, interpreted, exchanged, and used for decision making. These documents are human-readable but not machine-readable. In the process, sometimes, errors are made, and the core information is lost. Also, such data is open to interpretation from a person looking at it causing inconsistencies further in the chain. It is not easy to exchange such data between multiple parties. Lack of standards to exchange such data in different systems further exasperates the issue. Due to these issues, most of the unstructured data in businesses is left untouched leaving a huge gap in analytics and decision making.

The Platform

The solution is to classify documents and extract metadata from such unstructured documents using an Intelligent AI-powered document understanding platform. Such a platform can ingest documents in a variety of formats, at scale, and convert them into SMART Docs. SMART docs consist of relevant structured data (XML, JSON) extracted from such unstructured documents.

To convert unstructured documents to SMART docs, different types of extractors are used that can accurately detect and extract information from MS DOC files, Digital PDF files, Scanned PDF files, Images, etc. The unstructured documents may contain a variety of content like title, header/footer, tables, checkboxes, images, signatures, stamps, barcodes, etc. An intelligent document understanding platform performs layout detection to identify each of these areas in the document, OCRs the text or extracts the text based on the document type and creates a structured text output that can be further interpreted. A machine learning-based interpreter can be used to accurately detect and extract values from such a text and generate a corresponding SMART document.

The accuracy of detection can be further enhanced by the use of structured data to give meaning to the unstructured data. A domain-specific dictionary, taxonomy, ontology can be used to support the extraction making it much more efficient and accurate. Such structured data is either fed with the document or extracted from one part of the document and used to interpret other parts or can be retrieved from external sources during document extraction. For example, claims data along with the Electronic Health Record (EHR), use borrower names from one part of the document to search in other parts.

Type of data

A variety of metadata can be extracted from unstructured documents and they can be converted to SMART Docs. It can include but not limited to –

  • Labels and associated values with context
  • Checkbox values
  • Tables with variable columns and rows
  • Signature areas
  • Handwriting including signatures, dates, stamps
  • Named Entities
  • Pictures and metadata in pictures – Furniture in property pictures, Anomalies in X-ray or pathology pictures, etc.

Improving the accuracy

The accuracy of extracted data is very important in the financial, medical, and legal domains.  In other words, false positives and false negatives have a business cost. To ensure accuracy, manual intervention is needed to quickly inspect and correct any discrepancies between the document and the extracted data. An intelligent document understanding platform provides an intuitive, domain-specific, and fast operations user interface to provide 100% accuracy to the output data. Such a platform also has a built-in, continuous feedback loop to improve automated extraction accuracy based on the manual corrections.

Standards to exchange SMART Docs

The structured data can be exchanged with other systems seamlessly when it follows popular industry standards. For healthcare, FHIR (Fast Healthcare Interoperability Resources) Specification provides an interoperability standard in the US. The Mortgage Industry Standards Maintenance Organization (MISMO) provides a SMART doc standard for exchanging mortgage documents between different systems in the US. The document understanding platform generates structured data such that it can easily be transformed into the industry-standard format.

Enhanced Services

The unlocking of hidden metadata from the vast unstructured documents in an enterprise opens doors to a host of new services that would not have been possible otherwise.

Some of these services include:

  • Cross-comparison of data between various documents and other data sources
  • Regulatory Audit 
  • Standards Compliance
  • Fraud Detection
  • Risk calculations for individuals and property
  • Aggregation analytics of quality scores, risks across populations
  • Predictive analytics for budgeting, diseases

The document understanding brings in a host of services that not only benefits the businesses but also borrowers, patients, educators, and lawyers. The 60-year old patient can walk into the hospital with no papers since his legacy data is transformed using the document understanding platform into digital SMART EHR and is already sent to the hospital. The borrower receives the loan approval instantaneously since all his papers have been fed and converted to SMART Doc and have been automatically vetted using an intelligent Decision Management System. These benefits directly translate to cost and time savings for the entire ecosystem.

Author Biography.

Rajeev Dixit
Rajeev Dixit

He is an entrepreneur, doer, and passionate learner. He has been working with startups and founders to help them develop their ideas for a product ready for market growth. His specialties are Ideation, Product Management, Marketing, Architecture, User experience, and design, Multi-site agile teams, Hands-on development, Scalability and high-availability, Continuous integration and deployment, Open source frameworks, Big data predictive analytics and visualization, Data Science, Machine Learning.

Join Our Newsletter.

Subscribe to CrowdforThink newsletter to get daily update directly deliver into your inbox.

CrowdforGeeks is where lifelong learners come to learn the skills they need, to land the jobs they want, to build the lives they deserve.


CrowdforThink is a leading Indian media and information platform, known for its end-to-end coverage of the Indian startup ecosystem.


Our mission is "Har Koi Dekhe Video, Har Ghar Dekhe Video, Ghar Ghar Dekhe Video" so we Provide videos related to Tutorials, Travel, Technology, Wedding, Cooking, Dance, Festivals, Celebration.

Apna Video Wala

News & Blogs


List of Top IT Companies in Pune

1. Tata Consultancy Services Limited A part of the Tata group, India’s largest multination...


Reimagining Healthcare Ecosystem Post-Pandemic ...

The pandemic second wave started; James received his positive report. He checked bed and oxygen a...


Digital Transformation of Underwriting Process

Also Read:- How to Share iPhone Internet with a Notebook Also Read:- Scientists don't know...

Top Authors

Lamia Rochdi is the Marketing Manager at Bell Flavors & Fragrances EMEA. A successful family-...

Lamia Rochdi

I’m Mertin Wilson a technician in a camera company and certified expert of different P...

Mertin Wilson

Zakariya has recently joined the PakWheels team as a Content Marketing Executive, shortly after g...

Zakariya Usman

Pankaj Singh is a Senior Digital Marketing Consultant with more than 2 years of experience in SEO...

Pankaj Singh

Our Client Says

WhatsApp Chat with Our Support Team