Working with raw data requires the application of data filters, algorithms, data visualization tools, and more often than not, a data scientist. Even with the most complete and comprehensively representative database, an organization’s data is only as useful as it is meaningful and consumable. Department heads and line of business owners are often left to their own devices – guesswork and estimations – to make decisions about company strategy, budget allocation, application rejection or approval, diagnoses and other assessments.

With the release of version 3.1, Ephesoft Insight now provides an out-of-the-box outlier detection tool for data analytics. Utilizing supervised machine learning to compute statistical parameters, Ephesoft Insight readily identifies values that fall outside of standard deviations. For example, the below screenshot represents the outlier detection dashboard where a user teaches Insight which medical procedure costs within a set of insurance Explanation of Benefits (EOB) documents falls above or below the median cost by Service Code. No data scientist is required to crunch the numbers or the data.

Fig. 1: Ephesoft Insight user interface for creating an outlier learning model

Ephesoft Insight supports isolation of two types of data anomaly detection: nominal and categorical.

Discovering Nominal Outliers

When exploring a nominal (or numeric) dataset, it is necessary to establish norms and data averages to detect anomalies. Accordingly, Ephesoft Insight applies a linear regression model to a given dataset to create a predictive learning model. Threshold limits of the predictive values are calculated, and any selected column values that deviate from those boundary threshold limits are marked as outliers.

To put it simply, Insight looks at high and low values based on user input and finds unusual patterns and values outside the norm.

Use Case: Fraud Detection

Fraud is an expensive, criminal act that plagues nearly every industry and government agency. Any organization that exchanges submitted information for payment or recompense is at risk of fraudulent waste. Take expense report submissions, for example. According to the Association of Certified Fraud Examiners 2016 Global Fraud Study, the average medium-sized business loses $40,000 annually in fraudulent expense reimbursement. And the median timeframe for detection of continued employee reimbursement schemes is 24 months.

By using Ephesoft Insight, a line of business owner, not a data scientist, can take ownership of fraud oversight and apply parameters for outlier detection within a simple dashboard. For example, the Human Resources representative responsible for expense reimbursement can select key values like city or state, expense type, and expense amount to establish the average cost of a meal by geographic location. Once the outlier model has been created, Insight will automatically compare all newly submitted expense report values against the norm to identify submissions that contain expenses outside of that threshold. This HR rep now has a model that will identify outlier expenses daily as a means of fraud prevention.

Discovering Categorical Outliers

Insight accomplishes the identification of categorical outliers – analogous to textual inconsistencies – through the creation of a naive Bayes classification model. Similar to Ephesoft’s method of multi-dimensional extraction within Insight, naive Bayes classification comprises a family of algorithms for multi-variable analysis to identify values that don’t behave as expected.

Just as the Ephesoft platform for content capture only requires a few sample documents for creating a learning model to categorize and extract data, the Insight analytics engine only needs a small number of training data to generate the necessary parameters for data classification and outlier detection.

Use Case: Pathology Report Analysis

In the era of data-based decisions, unsupported doctoral diagnoses are a rarity. Categorical outlier detection allows doctors and clinicians to diagnose patients more quickly and prescribe treatment regimens based on historical pathology reports, communal health data and socio-economic data. By identifying normal and abnormal data (referred to as “inconsistent” in the Ephesoft outlier detection GUI), healthcare professionals will be able to create an outlier model inclusive of all available sources of data for optimal patient care.

While many healthcare data mining applications only look at the raw data for predictive analytics and diagnosis support, Ephesoft Insight provides organizations with a tool not only for Big Data analysis, but also for the actual data extraction from patient charts and records. The value of Ephesoft Insight is in its ability to link data from one source (like an EMR system) to data from other sources (like reference sets of pathology reports) to provide greater understanding of a patient’s status and better insights into public health trends.

For more information on Ephesoft’s data analytics platform or to request a demonstration of Ephesoft Insight’s outlier detection module, contact Ephesoft Sales.