Hey all,
Let me give a usecase first to get the right mindset in place:
=======Usecase=========
Multiple different invoices (invoice formats) faxed in dirty faxes/background watermarks bleeding through/other 'worst case' scenario poor images. In this usecase, a lot of what ephesoft would help, but not necessarily the OCR.
A single document type has these required outputs (note I said outputs, not necessarily inputs).
*Store ID
Store Name (fuzzydb or manual key only, assume no OCR other than 'instant validation' OCR)
Store City (fuzzydb or manual key only, assume no OCR other than 'instant validation' OCR)
*UPC code (1-N, tabular)
Product Description (1-N, tabular) (fuzzydb or manual key only, assume no OCR other than 'instant validation' OCR)
*Price per UPC
For the sake of worst-case scenario, let's assume multiple different invoices that are faxed in an uncontrolled manner making OCR rather difficult. The asterisk(*) above are the most effective entry/input points.
=======================
Given the above, a lot of what needs to be accomplished to make the capture solution more efficient already exists in Ephesoft, but maybe not to the level of dealing with keying(validation) and interactive fuzzydb lookups during the validation phase for key-value mapped fields that exist in the fuzzydb.
For example, let's assume always OCR fails related the above conditions(bad/dirty faxes):
*Key "store ID" during validation, auto-populate 'store name' field and 'store city' field. If it does not exist in the fuzzydb, leave blank for a keyer to key it in from the image (which might be in the logo for example).
*Key 'UPC code' during validation (in this case, in the tabular entry format), auto-populate the product description, or if it does not exist in the fuzzydb leave it blank for the keyer to key it in from the image (which might be grainy from the faxing).
======================
There are a couple of options, but I wanted to bring this up in the Architecture forum as you guys may have already attempted/tested several opportunities and have some more experience towards the best approach:
option 1) Interactive validation - during keying, fuzzydb lookups from specified fields (opposed to the fuzzydb search field) to then populate the other specified field(s) mapped in some type of key-value mapping(i.e. storeid '4054' = storedesc 'Middle of Nowhere Store') . This would be preferred as it gives the keyers the ability to cross-check the image with the lookups returned. This has the benefit for environments that want to have least-manual steps possible, and immediate confirmation of lookups against the image.
option 2) Double-Validation - Another two-part step after Validation, where if OCR fails, user enters the information from the image as-is, goes through a 'Validation verification' module that contains the fuzzydb step based on what was keyed, followed by a 'Double-Validation' manual step in the cases where the fuzzydb came back empty (or someone opts to always to double-validation). This has the benefit for environments that want two different eyes double-checking the input for quality (and, the double-validation step could happen even with option 1 as well), as well as low-bandwidth/high-latency environments where interactive lookup is not feasible.
option X) other options to deal with 'lookup-only or manual entry' fields (or difficult-to-ocr/non-ocr fields).
=======================
Thoughts?
