[phpBB Debug] PHP Notice: in file /viewtopic.php on line 988: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/Los_Angeles' for 'PDT/-7.0/DST' instead
[phpBB Debug] PHP Notice: in file /viewtopic.php on line 988: getdate(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected 'America/Los_Angeles' for 'PDT/-7.0/DST' instead
www.ephesoft.com/forums • View topic - Validation-level 'fuzzydb' integration question

Validation-level 'fuzzydb' integration question

General discussion about the Ephesoft architecture at a high-level

Validation-level 'fuzzydb' integration question

Postby dhartford » Tue May 31, 2011 7:00 am

Hey all,
Let me give a usecase first to get the right mindset in place:

=======Usecase=========
Multiple different invoices (invoice formats) faxed in dirty faxes/background watermarks bleeding through/other 'worst case' scenario poor images. In this usecase, a lot of what ephesoft would help, but not necessarily the OCR.

A single document type has these required outputs (note I said outputs, not necessarily inputs).
*Store ID
Store Name (fuzzydb or manual key only, assume no OCR other than 'instant validation' OCR)
Store City (fuzzydb or manual key only, assume no OCR other than 'instant validation' OCR)

*UPC code (1-N, tabular)
Product Description (1-N, tabular) (fuzzydb or manual key only, assume no OCR other than 'instant validation' OCR)

*Price per UPC

For the sake of worst-case scenario, let's assume multiple different invoices that are faxed in an uncontrolled manner making OCR rather difficult. The asterisk(*) above are the most effective entry/input points.
=======================
Given the above, a lot of what needs to be accomplished to make the capture solution more efficient already exists in Ephesoft, but maybe not to the level of dealing with keying(validation) and interactive fuzzydb lookups during the validation phase for key-value mapped fields that exist in the fuzzydb.

For example, let's assume always OCR fails related the above conditions(bad/dirty faxes):
*Key "store ID" during validation, auto-populate 'store name' field and 'store city' field. If it does not exist in the fuzzydb, leave blank for a keyer to key it in from the image (which might be in the logo for example).

*Key 'UPC code' during validation (in this case, in the tabular entry format), auto-populate the product description, or if it does not exist in the fuzzydb leave it blank for the keyer to key it in from the image (which might be grainy from the faxing).
======================
There are a couple of options, but I wanted to bring this up in the Architecture forum as you guys may have already attempted/tested several opportunities and have some more experience towards the best approach:

option 1) Interactive validation - during keying, fuzzydb lookups from specified fields (opposed to the fuzzydb search field) to then populate the other specified field(s) mapped in some type of key-value mapping(i.e. storeid '4054' = storedesc 'Middle of Nowhere Store') . This would be preferred as it gives the keyers the ability to cross-check the image with the lookups returned. This has the benefit for environments that want to have least-manual steps possible, and immediate confirmation of lookups against the image.

option 2) Double-Validation - Another two-part step after Validation, where if OCR fails, user enters the information from the image as-is, goes through a 'Validation verification' module that contains the fuzzydb step based on what was keyed, followed by a 'Double-Validation' manual step in the cases where the fuzzydb came back empty (or someone opts to always to double-validation). This has the benefit for environments that want two different eyes double-checking the input for quality (and, the double-validation step could happen even with option 1 as well), as well as low-bandwidth/high-latency environments where interactive lookup is not feasible.

option X) other options to deal with 'lookup-only or manual entry' fields (or difficult-to-ocr/non-ocr fields).
=======================

Thoughts?
dhartford
 
Posts: 124
Joined: Tue Mar 29, 2011 9:46 am
Location: Maine, USA

Re: Validation-level 'fuzzydb' integration question

Postby ephesoft » Wed Jun 01, 2011 8:31 am

D,
Very good writeup. Fuzzy DB can be usefull in many use cases, especially with fax processing.
Let us comment on a few areas.

1- Ephesoft Fuzzy DB matching will actually designed to run not only in Validation but also in Extraction phase. So if the faxes are coming in your example and we are supposed to find the Store name, Ephesoft Fuzzy DB plugin will look at all the words on that invoice and pass them to the DB Engine to find a matching database record, without any user intervention. Which means Ephesoft could find the exact match using partial address, phone number or anything else exists on the document. This allows Ephesoft to find the data automatically with zero configuration. No need to extract phone number and perform a Database lookup in traditional ways.
Most of our clients use this feature to find the Vendor ID on the invoices but Fuzzy DB can be used for any Document-to-Database matching scenario.
If the Extraction could not find the data, then user in Validation can simply type the search keywords separated by space (like in the google search box). There is no need to specify name, address or phone separately.

2- Invoice Automation is a bit different than any other data-entry process so we have never seen a double-key entry requirement in the field. I think it's because there are so many checks in the ERP systems already such as duplicate invoice check, Amount cannot exceed PO number, etc. Some of these rules are applied to Ephesoft validation rules so sending a check to a vendor with incorrect amount is almost impossible. Most customers want to eliminate data entry not create too much of it.

Thank you again. It looks like you have a good handle on how Ephesoft can be used in real world.

Ephesoft Team.
ephesoft
Site Admin
 
Posts: 90
Joined: Wed Feb 10, 2010 10:27 pm


Return to High-Level Architecture Discussion

Who is online

Users browsing this forum: No registered users and 1 guest