CareCloud uses Machine Learning to organize unstructured documents


Company Profile

CareCloud is the leading provider of cloud-based practice management, electronic health record (EHR), patient engagement and revenue cycle management software and services for medical groups. Through an intuitive, powerful and integrated platform, CareCloud is enabling healthcare organizations to achieve operational excellence while delivering a more modern patient experience.

CareCloud is helping thousands of physicians increase collections, streamline workflow, and improve patient care in 49 states, and currently manages more than $4 billion in annualized accounts receivables on behalf of its revenue cycle management clients.

Business Situation

CareCloud receives thousands of medical documents from hospitals and medical clinics that need to be classified. Their existing process involves extracting text from those scanned documents and then using a proprietary tool to identify what type of document it is. They wanted to investigate automating this process using Machine Learning and wanted to classify scanned documents based on the contents of those files.

Google Cloud Implementation

Use Google’s Machine Learning APIs and TensorFlow to automate the identification and extraction of key fields from documents. The solution should be robust and should work without manual changes even whenever the format of documents change over time.

At the core of extracting text from a document is the use of Google’s Vision API and its OCR capabilities. However, in order to make the extraction robust, the solution should not rely on rules that look at specific blocks in the document to map text. Instead, the following additional techniques will be implemented during the POC.

  1. Implement a Document Classification model using TensorFlow to classify documents and identify the document type.
  2. In order to organize the documents and extract intelligence a word2vec model was built – this allowed their users to search based on synonyms and phrases, making the raw database of documents much more valuable.

Let’s Chat