Seven Ways Document Analytics Can Turn Digital Clutter into an Organizational Asset

As big data technologies mature and adoption rates increase, large organizations are beginning to realize the potential value of data as a critical asset to their operations, decision-making, and ultimately their bottom line.  And for many large corporations and government agencies, their enterprise content management (ECM) systems contribute a significant portion of their organization’s overall data, especially when they’re adopted at the enterprise level.

Unstructured Data in Content Repositories  

These organizations need to consider the vast amounts of unstructured data within their ECM systems.  Even with a relatively effective business taxonomy in place to keep content organized, 99%+ of the data living on documents within the ECM system is unquestionably, unequivocally unstructured.  99% of DataA PDF document may be neatly filed away in a folder hierarchy with a few relevant metadata tags, but what if this file is an 82-page contract?  What are the business impacts and implications of the content, and more importantly the context, trapped within that document?  Take this example and multiply it by the scale of a large organization, and you start to get an idea of the scope of this problem (and opportunity).

To help companies and government agencies make sense of these mountains of digital clutter, we have launched a machine learning document mining platform called Ephesoft Insight that can live on top of content repositories to analyze files at big data scale and extract business meaning (and business value) from them.

Here are seven use cases of the Ephesoft Insight platform to get you thinking about how document analytics may benefit your organization.

  1. Fraud Detection – Money Laundering, Health Care Fraud, Insurance Fraud, Loan Fraud

In my experience, when you talk to anyone about the applications of big data, fraud almost always come up in the discussion.  Detecting fraud is all about finding the proverbial needle in the haystack: understanding the red flags of fraud in your industry and then searching your data for those red flags.  This is what big data platforms do best.  Customers across industries, from financial services to government to healthcare, are leveraging our software to analyze their document content for fraud.  The red flags vary by industry and use case, and the unstructured content certainly varies, but the underlying concept is the same: Insight is using machine learning to find anomalies of interest.


  1. Human Resources – Workforce Planning and Human Capital Management

There are 70 companies on the Fortune 500 that employ over 100,000 people each.  And countless other companies and government agencies that have at least 10,000 or more employees.  These numbers equate to a lot of HR files, to the tune of millions and millions of pages.  There is value in this data. Insight can crawl personnel files — which include recruiting documents such as applications and resumes, records related to job offers and promotions, compensation information, policy acknowledgments, disciplinary notices, and termination records – and use the data to make recommendations related to hiring (including background investigations), workforce optimization, and budgetary planning.  Insight can help understand the trends related to attrition and help plan ahead.

  1. Loan Application Processes – Mortgage, Education, Commercial and Industrial Loans

Machine Learning vs Document ImagingUnderwriting a loan is an expensive and laborious process, to both the lender and the borrower.  Ephesoft Insight sits at the intersection of big data and document content — and from personal experience, the mortgage application process is the first thing I think of sitting in the center of this Venn diagram.

Insight helps lenders automate the process of validating borrower-provided documentation against third party sources.  This enables them to assess risk and improve loan-level decision making, by using Insight’s machine learning algorithms to analyze the data for accuracy, consistency, and completeness.

  1. Local Government Optimization – Departmental Workflow and Data Analysis

Virginia’s Fairfax County (where I’m from) has more than 70 different government agencies supporting its citizens, everything from police and fire rescue to health and human services to multiple court systems.  All of these departments collect and generate a massive amount of documentation.  Every county across the country has similar services, especially the larger ones, and are ripe for automation.

Family services departments, for example, could leverage Insight to crawl their historical record of domestic violence cases to determine ways to optimize their counseling and support group services.  The police department could take it a step further and analyze unstructured text on every police report ever filed and correlate it with crime records to build a predictive model for optimizing their officer dispatch plans.

  1. The Big One: Healthcare – Real-time Alerting, Predictive Analytics, and Administrative Optimization

As the behemoth healthcare industry continues its shift from a fee-for-service model to value-based-case, successful population health management is largely dependent on the data it generates.  Even with standardized data formats like HL7, vast amounts of unstructured text still lurk in the data.  From encounter notes to pathology reports and hospital discharge summaries, Insight can be used to put structure to the unstructured blocks of text that are found on these types of content.  With more structure comes slew of benefits: improved real-time alerting on individual patient’s health, predictive analytics related to cohort identification and treatment strategies, and optimization of the administrative functions of hospitals.

  1. The Wide, Wide World of Regulatory Compliance

There is nothing fun about regulations and compliance, but in many cases, they are vital to a well-functioning economy and the general well-being of the population.  Insight’s ability to crawl multiple repositories can help organizations within every industry analyze its content for risk associated with regulatory compliance – everything from HIPAA, to federal records retention policies, to financial regulations governed by the SEC, FINRA, OCC, and others.  For example, VA Inspector General investigations have uncovered documents containing protected health information (PHI) and other PII being stored in employees’ network folder shares.   Insight can eliminate these risky scenarios.  Information management experts have a concept called ROT – redundant, obsolete, trivial content – that is related to this use case, and the Insight platform is fundamentally equipped to address it.


  1. Case Management and Next Generation e-Discovery

Core to mining documents for meaningful information is the analysis of text.  When I describe the Insight platform to people, sometimes I get the question, “Is it an e-discovery tool?”.  The answer is yes and no.  Insight was not built specifically as an e-discovery solution, but when you start to look at the process of what our platform is doing — collecting, identifying, extracting, and analyzing text – there’s a case to be made that Insight can help with case management.  Leveraging machine learning, specifically the multi-dimensional extraction and classification platforms under the hood within Insight, can help automate case management workflows.  Combine that with the fact that many large organizations have adopted off-the-shelf case management modules from the top enterprise ECM vendors, and you can start to see where Insight can fit in.

Final Thoughts

So there you have it: seven broad use cases for document analytics.  There are likely hundreds more.  Insight as a platform has the scale to tackle millions and billions (and maybe one day, trillions?) of documents, and the flexibility to be applied to any industry.  Machine learning is the future of information management, and it’s ready for you today with Ephesoft Insight.