Or how to know what you don’t know…

The Brave New World of Big Document Analytics

There are two universal reasons why organizations accumulate information:

  • To make processes run more efficiently;
  • To gain insight into their operations and strategic direction.

In the imaging industry, we have grown accustomed to business process automation…or capture solutions that automate document-driven processes. However the second reason to amass unstructured information–the concept of utilizing capture technology for analytics and data discovery—is relatively new.

There is a dichotomy in the perceived value of corporate information: If it is structured and normalized, it becomes useable data. If it is unstructured information, it is virtually useless. In fact, for most organizations, information contained on documents is viewed as a nuisance.  Companies only keep document records because they may need to find them again or they are compelled by litigation or compliance to maintain records for a period of time.

When it comes to data it’s all about mining for business insight.  However there is a lot of valuable information hidden in those documents…information that is ripe for analytics…it is just harder to mine than structured data.

You can think of document analytics as akin to offshore oil extraction.  For decades the oil & gas industry drilled in fields for the easily available resources. Yet when the technology for drilling advanced, extracting oil from offshore wells became practical. Today, the technology for extracting insight from unstructured content is practical and cost effective.

Document Capture vs. Document Analytics

Ephesoft makes extracting value from documents practical.  It is important to understand the difference between document capture and analytics. Both share the goal of extracting meaning from unstructured content.  With document capture there is usually one of two goals:

  • Capture images and any information that could be helpful in finding the document again.
  • Extract the information on the document that is needed to advance a process (sales order, patient encounter, service request, employment app, and the like)

With document analytics, it gets a bit more complex, because there are more variables.  The person “capturing” the document is probably not the person performing the analytics. Moreover, you may not know what specific information you want to analyze at the time of capture. Finally, you may already have dozens of disparate document and data repositories within your organization. Extracting value and meaning from these resources on a universal level, requires new business tools and a new approach to document analytics.
Big Document Analytics - How to know what you don’t know

Certainly, the core technology behind document capture and document analytics is similar.  Capture is the practical/tactical application of document classification and extraction; analytics is the strategic application of advanced capture technologies.  As such, it offers amazing new applications from fraud detection to legal discovery to medical research.

This October, Ephesoft is holding our second annual user conference INNOVATE, where we will have general sessions that will cover the different approaches to document capture and document analytics. We will also have tactical breakout sessions discussing how our solutions can accelerate business processes and enable you to take large volumes of documents—regardless if they have been classified, indexed, or OCR/text captured—and mine them for data and context.   I hope you can join us at the conference.

Tim Dubes is Vice President of Marketing at Ephesoft, Inc