SYSTEM AND METHOD FOR LOCALIZING DATA FIELDS ON STRUCTURED AND SEMI-STRUCTURED FORMS
First Claim
1. A method for localizing data fields of a form, said method comprising:
- receiving by at least one processor an image of a form, the form including data fields;
identifying by the at least one processor word boxes of the image;
grouping by the at least one processor the word boxes into candidate zones, each candidate zone including one or more of the word boxes;
forming by the at least one processor hypotheses from the data fields and the candidate zones, each hypothesis assigning one of the candidate zones to one of the data fields or a null data field; and
,performing by the at least one processor a constrained optimization search of the hypotheses for an optimal set of hypotheses, the optimal set of hypotheses optimally assigning word boxes to corresponding data fields.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and system to localize data fields of a form. An image of a form is received, where the form includes data fields. Word boxes of the image are identified. The word boxes are grouped into candidate zones, where each of the candidate zones includes one or more of the word boxes. Hypotheses are formed from the data fields and the candidate zones, where each hypothesis assigns one of the candidate zones to one of the data fields or a null data field. A constrained optimization search of the hypotheses is performed for an optimal set of hypotheses. The optimal set of hypotheses assigns word box groups to corresponding data fields.
26 Citations
20 Claims
-
1. A method for localizing data fields of a form, said method comprising:
-
receiving by at least one processor an image of a form, the form including data fields; identifying by the at least one processor word boxes of the image; grouping by the at least one processor the word boxes into candidate zones, each candidate zone including one or more of the word boxes; forming by the at least one processor hypotheses from the data fields and the candidate zones, each hypothesis assigning one of the candidate zones to one of the data fields or a null data field; and
,performing by the at least one processor a constrained optimization search of the hypotheses for an optimal set of hypotheses, the optimal set of hypotheses optimally assigning word boxes to corresponding data fields. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for localizing data fields of a form, said system comprising:
at least one processor programmed to; receive an image of a form, the form including data fields; identify word boxes of the image; group the word boxes into candidate zones, each candidate zone including one or more of the word boxes; form hypotheses from the data fields and the candidate zones, each hypothesis assigning one of the candidate zones to one of the data fields or a null data field; and
,perform a constrained optimization search of the hypotheses for an optimal set of hypotheses, the optimal set of hypotheses optimally assigning word boxes to corresponding data fields. - View Dependent Claims (15, 16, 17, 18, 19)
-
20. A data extraction system including:
at least one processor programmed to; receive an image of a form and a template model of the form, the form including data fields; identify word boxes of the image; group the word boxes into candidate zones, each candidate zone including one or more of the word boxes; form hypotheses from the data fields and the candidate zones, each hypothesis assigning one of the candidate zones to one of the data fields or a null data field; determine, for each of the hypotheses, an assignment quality, wherein the assignment quality for each of the hypotheses assigning a candidate zone to one of the data fields is based on a template model of the form; perform a constrained optimization search of the hypotheses for an optimal set of hypotheses based on assignment quality, the optimal set of hypotheses optimally assigning word boxes to corresponding data fields and the hypotheses of the optimal set of hypotheses being non-overlapping in word-box support; and
,extract data from the word boxes and assign the extracted data to corresponding data fields based on the optimal set of hypotheses.
Specification