Document Scanning and Data Derivation Architecture.
First Claim
1. Tax form and data document scanning and derivation;
- tax form, box and line item;
recognition, capture, extraction and processing architecture;
means to recognize scanned Internal Revenue Service (“
IRS”
) tax form(s); and
means to capture identification of scanned Internal Revenue Service tax form(s); and
means to organize scanned Internal Revenue Service tax form(s) electronically means to recognize scanned IRS form(s) line and box item(s) data from recognized and captured scanned IRS form(s); and
means to capture scanned IRS form(s) line and box item(s); and
means to extract scanned IRS form(s) line and box item(s) into computer, electronic file or other tax preparation software or process. means to import scanned box and line item information directly into IRS form 1040 for filing.
0 Assignments
0 Petitions
Accused Products
Abstract
Proprietary suite of underlying document image analysis capabilities, including a novel forms enhancement, segmentation and modeling component, forms recognition and optical character recognition. Future version of the system will include form reasoning to detect and classify fields on forms with varying layout. Product provides acquisition, modeling, recognition and processing components, and has the ability to verify recognized data on the image with a line by line comparison. The key enabling technologies center around the recognition and processing of the scanned forms. The system learns the positions of lines and the location of text on the pre-printed form, and associates various regions of the form with specific required fields in the electronic version. Once the form is recognized, the preprinted material is removed and individual regions are passed to an optical character recognition component. The current proprietary OCR engine is trained with a variety of Roman text fonts and has a back end dictionary that can be customized to account for the fact that the system knows which field it is recognizing. The engine performs segmentation to obtain isolated characters and computes a structure based feature vector. The characters are normalized and classified using a cluster centric classifier, which responds well to variations in the symbols contour. An efficient dictionary lookup scheme provides exact and edit distance lookup using a TRIE structure. An edit distance is computed and a collection of near misses can be output in a lattice to enhance the final recognition result. The current classification rate can exceed 99% with context. The ultimate goal of this system is to enable the processing of all tax forms including forms with handwritten material.
61 Citations
22 Claims
-
1. Tax form and data document scanning and derivation;
- tax form, box and line item;
recognition, capture, extraction and processing architecture;
means to recognize scanned Internal Revenue Service (“
IRS”
) tax form(s); and
means to capture identification of scanned Internal Revenue Service tax form(s); and
means to organize scanned Internal Revenue Service tax form(s) electronically means to recognize scanned IRS form(s) line and box item(s) data from recognized and captured scanned IRS form(s); and
means to capture scanned IRS form(s) line and box item(s); and
means to extract scanned IRS form(s) line and box item(s) into computer, electronic file or other tax preparation software or process. means to import scanned box and line item information directly into IRS form 1040 for filing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- tax form, box and line item;
- 10. A method for digitally organizing scanned tax form(s).
- 14. A method for organizing scanned tax form data line and box item information.
-
16. A method for transferring scanned tax data into Internal Revenue Service form 1040.
-
22. A method for transferring scanned tax data into tax preparation software;
- such as TurboTax®
, ProSystems®
, TaxCut®
, any other similar tax preparation programs.
- such as TurboTax®
Specification