Data processing system and method for sequentially repairing character recognition errors for scanned images of document forms
First Claim
1. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
- inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;
generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process;
assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image;
inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment;
transferring said MGDS to a coded data repair process, for repairing said recognition coded data;
augmenting said MGDS with a repair segment which includes a repair data buffer portion;
accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating repaired coded data using said repair process;
inserting said repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said repair data buffer portion of said repair segment; and
transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
1 Assignment
0 Petitions
Accused Products
Abstract
A data processing system uses a machine-generated data structure (MGDS) to dynamically record and use the character recognition and repair histories of category fields on a document form. The MGDS includes a field data segment which has a coded data buffer portion and an error buffer portion for the extracted field image. Recognition coded data is entered into the coded data buffer portion and recognition error data is entered into the error buffer portion of the field data segment. Then subsequent repair processes can be applied to the recognition coded data by augmenting the MGDS with a repair segment for each character string which is repaired. A sequence of repair stages can be applied to a particular character string, each repair step adding another repair segment to the MGDS. At each stage of repair, the best estimate of the character string is placed into the coded data buffer portion of the field data segment. This enables the best estimate of the information content of the document field to be readily available for each stage of repair and for ultimate use in the data processing system.
-
Citations
10 Claims
-
1. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a repair segment which includes a repair data buffer portion; accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating repaired coded data using said repair process; inserting said repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said repair data buffer portion of said repair segment; and transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
2. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a first coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a first repair segment which includes a first repair data buffer portion; accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating first repaired coded data using said first repair process; inserting said first repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said first repair data buffer portion of said first repair segment; transferring said MGDS to a second coded data repair process, for repairing said first repaired coded data; augmenting said MGDS with a second repair segment which includes a second repair data buffer portion; accessing said first repaired coded data from said coded data buffer portion of said field data segment and generating second repaired coded data using said second repair process; inserting said second repaired coded data into said coded data buffer portion of said field data segment and inserting said first repaired coded data into said second repair data buffer portion of said second repair segment; and transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
3. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a first coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a first repair segment which includes a first repair data buffer portion and an alternate data buffer portion; accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating first repaired coded data and alternate coded data using said first repair process; inserting said first repaired coded data into said coded data buffer portion of said field data segment, inserting said recognition coded data into said first repair data buffer portion of said first repair segment and inserting said alternate coded data into said alternate data buffer portion of said first repair segment; transferring said MGDS to a second coded data repair process, for repairing said first repaired coded data; augmenting said MGDS with a second repair segment which includes a second repair data buffer portion;
accessing said first repaired coded data from said coded data buffer portion of said field data segment, accessing said alternate coded data from said alternate data buffer portion of said first repair segment and generating second repaired coded data using said second repair process;inserting said second repaired coded data into said coded data buffer portion of said field data segment and inserting said first repaired coded data into said second repair data buffer portion of said second repair segment; and transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
4. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process, said recognition error data including error location information; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a repair segment which includes a repair data buffer portion; accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating repaired coded data using said repair process; inserting said repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said repair data buffer portion of said repair segment; transferring said MGDS and said digital document image to a workstation display device; accessing the contents of said coded data buffer portion of said field data segment and displaying it at said workstation as a corrected form of said recognition coded data; and accessing said error location information from said error buffer portion of said field data segment, displaying said digital document image at said workstation and highlighting a displayed portion of said field identified by said error location information.
-
-
5. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process, said recognition error data including error location information; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS and said digital document image to a workstation display device, for repairing said recognition coded data; augmenting said MGDS with a repair segment which includes a repair data buffer portion; accessing the contents of said coded data buffer portion of said field data segment and displaying it at said workstation as said recognition coded data; accessing said error location information from said error buffer portion of said field data segment, displaying said digital document image at said workstation and highlighting a displayed portion of said field identified by said error location information; generating repaired coded data at said workstation; inserting said repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said repair data buffer portion of said repair segment; transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
6. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process, said recognition error data including error location information; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a first coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a first repair segment which includes a first repair data buffer portion; processing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating first repaired coded data using said first repair process; inserting said first repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said first repair data buffer portion of said first repair segment; transferring said MGDS and said digital document image to a workstation display device, for repairing said first repaired coded data; augmenting said MGDS with a second repair segment which includes a second repair data buffer portion; accessing said first repaired coded data from said coded data buffer portion of said field data segment and displaying it at said workstation; accessing said error location information from said error buffer portion of said field data segment, displaying said digital document image at said workstation and highlighting a displayed portion of said field identified by said error location information; generating second repaired coded data at said workstation; inserting said second repaired coded data into said coded data buffer portion of said field data segment and inserting said first repaired coded data into said second repair data buffer portion of said second repair segment; transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
7. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a first coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a first repair segment which includes a first repair data buffer portion and a repair certainty buffer portion; accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating first repaired coded data and generating a repair certainty value using said first repair process; inserting said first repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said first repair data buffer portion of said first repair segment and inserting said repair certainty value into said repair certainty buffer portion of said first repair segment; transferring said MGDS to a second coded data repair process, for selectively repairing said first repaired coded data; accessing said repair certainty value from said first repair segment and in response thereto, selectively augmenting said MGDS with a second repair segment which includes a second repair data buffer portion; selectively accessing in response to said repair certainty value, said first repaired coded data from said coded data buffer portion of said field data segment and generating second repaired coded data using said second repair process; selectively inserting in response to said repair certainty value, said second repaired coded data into said coded data buffer portion of said field data segment and inserting said first repaired coded data into said second repair data buffer portion of said second repair segment; and transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
8. In a data processing system, a computer program which, when executed in the data processing system, performs a method for repairing character recognition errors for digital images of document forms, the method comprising the steps of:
-
inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process; assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; transferring said MGDS to a coded data repair process, for repairing said recognition coded data; augmenting said MGDS with a repair segment which includes a repair data buffer portion; accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating repaired coded data using said repair process; inserting said repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said repair data buffer portion of said repair segment; and transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data.
-
-
9. A data processing system for repairing character recognition errors for digital images of document forms, comprising:
-
an intelligent forms processor, for inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image; said intelligent forms processor generating recognition coded data from said extracted field image and generating recognition error data using a character recognition process; said intelligent forms processor assembling a machine generated data structure (MGDS) which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image; said intelligent forms processor inserting said recognition coded data into said coded data buffer portion and inserting said recognition error data into said error buffer portion of said field data segment; a first coded data repair processor coupled to said intelligent forms processor, for receiving said MGDS and repairing said recognition coded data; said first coded data repair processor augmenting said MGDS with a first repair segment which includes a first repair data buffer portion; said first coded data repair processor accessing said recognition coded data from said coded data buffer portion and accessing said recognition error data from said error buffer portion of said field data segment and generating first repaired coded data using said first repair process; said first coded data repair processor inserting said first repaired coded data into said coded data buffer portion of said field data segment and inserting said recognition coded data into said first repair data buffer portion of said first repair segment; a utilization processor coupled to said first coded data repair processor, for receiving said MGDS and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of said recognition coded data. - View Dependent Claims (10)
-
Specification