Method and system for searching form features for form identification
First Claim
1. A method of identifying a target form having a plurality of data fields, the method comprising:
- storing a digitized image produced by a scanning device;
defining a region having boundaries in the digitized image;
recognizing a portion of the content of the digitized image located within the boundaries of the region;
associating the recognized content with a set of one or more characters;
comparing the format of the set of one or more characters to a plurality of format sequences; and
flagging the form for use in a data capture process if a comparison is found between the characters and one of the plurality of format sequences.
4 Assignments
0 Petitions
Accused Products
Abstract
A method of and system for identifying a target form for increased efficiency in an automated data capture process is described. Forms are scanned and stored as digitized images. Regions are defined on the form relative to corresponding reference points between the form and the digitized image. The regions are defined in areas that contain anticipated digitized data from data fields of the form. Digitized data is recognized through such means as optical character recognition (OCR) and the resulting string variable is compared in form to a plurality of formats expected for that data. Scoring systems are used to obtain a resultant score for a number of string variables which is compared to a predetermined confidence number. If said confidence number is reached, the form is flagged as a target form and used in the data capture process. A first step identification of certain graphical features can be added as an initial determination as to the source of the form.
76 Citations
35 Claims
-
1. A method of identifying a target form having a plurality of data fields, the method comprising:
-
storing a digitized image produced by a scanning device;
defining a region having boundaries in the digitized image;
recognizing a portion of the content of the digitized image located within the boundaries of the region;
associating the recognized content with a set of one or more characters;
comparing the format of the set of one or more characters to a plurality of format sequences; and
flagging the form for use in a data capture process if a comparison is found between the characters and one of the plurality of format sequences. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of identifying a target form having a plurality of data fields, the method comprising:
-
storing a digitized image;
obtaining a set of one or more characters through recognition of the content located within boundaries of a region in the digitized image;
comparing the format of the set of one or more characters to a plurality of format sequences; and
flagging the form for use in a data capture process if a comparison is found between the set of one or more characters and one of the plurality of format sequences. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of identifying a target form having a plurality of data fields, the method comprising:
-
(a) defining in a digitized image a region having boundaries;
(b) obtaining a set of one or more characters through recognition of the content located within boundaries of a region in the digitized image;
(c) comparing the format of the set of one or more characters to a plurality of format sequences;
(d) assigning a score based on a highest ranking comparison between the set of one or more characters and one of the plurality of format sequences;
(e) repeating acts (a) through (d) for at least one other region and adding the scores for the regions to get a total score; and
(f) comparing the total score to a confidence value whereby, if the total score equals or exceeds the confidence value, the form is identified as the target form intended for use in a data capture process. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for identifying a target form having a plurality of data fields, the system comprising:
-
a recognition module configured to transform content of a digitized image located within boundaries of a region of the digitized image into a set of one or more characters, the region having boundaries;
an identification module configured to compare the format of the set of one or more characters to a plurality of format sequences; and
a scoring module configured to assign a score to the set of one or more characters, the score being based on a highest ranking comparison between the set of one or more characters and one of the plurality of format sequences, and to compare the score to a confidence value whereby, if the score exceeds the confidence value, the form is flagged as a target form for use in a data capture process. - View Dependent Claims (27, 28)
-
-
29. A system for identifying a target form having a plurality of data fields, the system comprising:
-
a recognition module configured to transform content located within boundaries of a region in a digitized image into a set of one or more characters;
an identification module configured to compare the format of the set of one or more characters to a plurality of format sequences; and
a scoring module configured to assign a score to the set of one or more characters, the score based on a highest ranking comparison between the set of one or more characters and one of the plurality of format sequences, and to compare the score to a confidence value whereby, if the score exceeds the confidence value, the form is flagged as a target form for use in a data capture process. - View Dependent Claims (30)
-
-
31. A system for identifying a target form having a plurality of data fields, the system comprising:
-
a recognition module configured to transform content located within boundaries of a region in a digitized image into a set of one or more characters;
an identification module configured to compare the format of the set of one or more characters to a plurality of format sequences; and
a scoring module configured to identify the form for use in a data capture process if a comparison is found between the set of one or more characters and one of the plurality of format sequences. - View Dependent Claims (32)
-
-
33. A method of identifying a target form having a plurality of data fields, the method comprising:
-
filtering possible candidate forms by use of graphical features;
comparing the format of recognized characters to a plurality of format sequences in a digitized image of one of the filtered candidate forms; and
flagging the one of the filtered candidate forms for use in a data capture process if a comparison is found between the recognized characters and one of the plurality of format sequences.
-
-
34. A computer usable medium having computer readable program code embodied therein for identifying a target form having a plurality of data fields, the computer readable code comprising instructions for:
-
storing a digitized image produced by a scanning device;
defining a region having boundaries in the digitized image;
recognizing a portion of the content of the digitized image located within the boundaries of the region;
associating the recognized content with a set of one or more characters;
comparing the format of the set of one or more characters to a plurality of format sequences; and
flagging the form for use in a data capture process if a comparison is found between the characters and one of the plurality of format sequences.
-
-
35. A programmable storage medium having computer readable program code embodied therein for identifying a~target form having a plurality of data fields, the computer readable code comprising instructions for:
-
storing a digitized image;
obtaining a set of one or more characters through recognition of the content located within boundaries of a region in the digitized image;
comparing the format of the set of one or more characters to a plurality of format sequences; and
flagging the form for use in a data capture process if a comparison is found between the set of one or more characters and one of the plurality. of format sequences.
-
Specification