Method and system for interactive ground-truthing of document images
First Claim
1. A method for analyzing a document image, comprising:
- segmenting the document image to identify a set of image objects within the document image;
processing the set to group image objects within the set into a plurality of subsets, the subsets including one or more image objects;
linking reference image objects to corresponding subsets in the plurality of subsets;
creating machine readable data structures pairing the reference image objects with linked metadata fields, whereby image objects in the corresponding subsets are linked to common metadata in the linked metadata fields; and
presenting the reference image objects to a user, and accepting input from the user, to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects in the corresponding subsets.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image operates by segmenting a document image into a set of image objects, and linking the image objects with fields that store metadata. Image objects identified by segmenting the document image are grouped into subsets. The image objects are grouped according to characteristics suggesting that the image objects may have common ground-truthed metadata. By grouping the image objects into subsets, the image objects may be indexed to facilitate the ground-truthing process. In some embodiments, the index of representative image objects is presented to the user in a table form. A database of image objects with ground-truthed metadata is formed. Interactive tools and processes facilitate ground-truthing based on paired image objects and metadata.
93 Citations
138 Claims
-
1. A method for analyzing a document image, comprising:
-
segmenting the document image to identify a set of image objects within the document image;
processing the set to group image objects within the set into a plurality of subsets, the subsets including one or more image objects;
linking reference image objects to corresponding subsets in the plurality of subsets;
creating machine readable data structures pairing the reference image objects with linked metadata fields, whereby image objects in the corresponding subsets are linked to common metadata in the linked metadata fields; and
presenting the reference image objects to a user, and accepting input from the user, to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects in the corresponding subsets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for analyzing a document image, comprising:
-
segmenting the document image to identify a set of image objects within the document image;
creating machine readable data structures pairing the identified image objects in the set with linked metadata fields; and
presenting representations of the identified image objects to a user, and accepting audio input translated with speech recognition tools to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects to which the respective metadata fields are linked. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method for analyzing a document image, comprising:
-
segmenting the document image to identify a set of image objects within the document image;
applying text recognition tools to produce proposed text for the set of image objects;
processing the set to group image objects with the set into a plurality of subsets, the subsets including one or more image objects;
linking reference image objects to corresponding subsets in the plurality of subsets;
creating machine readable data structures pairing the reference image objects with linked metadata fields, whereby image objects in the corresponding subsets are linked to common metadata in the linked metadata fields, and populating the linked metadata fields based on the proposed text; and
presenting the reference image objects to a user, and accepting input from the user, to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects in the corresponding subsets, including accepting input to verify and to edit the proposed text to establish the ground-truthed metadata. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61)
-
-
62. A method for analyzing a document image, comprising:
-
providing a database of representative image objects with linked metadata fields storing metadata, the metadata including searchable characteristics of image objects matching the representative image objects;
segmenting the document image to identify a set of image objects within the document image;
processing the set to match image objects in the set with representative image objects in the database, and to link matching image objects in the set with particular representative image objects in the database; and
displaying instances of image objects in the set that are linked with a particular representative image object in the database, and accepting user input to interactively undo the link of selected image objects with the particular representative image object. - View Dependent Claims (63, 64, 65, 66, 67, 68, 69)
-
-
70. An apparatus, comprising:
a data processing system including a user input device, a display, one of memory, or access to memory, storing a document image, and resources for processing the document image, the resources including logic to;
segment the document image to identify a set of image objects within the document image;
process the set to group image objects within the set into a plurality of subsets, the subsets including one or more image objects;
link reference image objects to corresponding subsets in the plurality of subsets;
store data structures pairing the reference image objects with linked metadata fields, whereby image objects in the corresponding subsets are linked to common metadata in the linked metadata fields; and
present the reference image objects to a user on the display, and accept input from the user via the user input device, to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects in the corresponding subsets. - View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91)
-
92. An apparatus for analyzing a document image, comprising:
a data processing system including a user input device, a display, one of memory, or access to memory, storing a document image, and resources for processing the document image, the resources including logic to;
segment the document image to identify a set of image objects within the document image;
create and store machine readable data structures pairing the identified image objects in the set with linked metadata fields; and
present representations of the identified image objects to a user, and accepting audio input translated with speech recognition tools to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects to which the respective metadata fields are linked. - View Dependent Claims (93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104)
-
105. An apparatus, comprising:
a data processing system including a user input device, a display, one of memory, or access to memory, storing a document image, and resources for processing the document image, the resources including logic to;
segment the document image to identify a set of image objects within the document image;
apply text recognition tools to produce proposed text for the set of image objects;
process the set to group image objects with the set into a plurality of subsets, the subsets including one or more image objects;
link reference image objects to corresponding subsets in the plurality of subsets;
create and store machine readable data structures pairing the reference image objects with linked metadata fields, whereby image objects in the corresponding subsets are linked to common metadata in the linked metadata fields, and populating the linked metadata fields based on the proposed text; and
present the reference image objects to a user, and accept input from the user, to interactively populate the linked metadata fields with ground-truthed metadata, the metadata including searchable characteristics of the image objects in the corresponding subsets, including logic to accept input to verify and to edit the proposed text to establish the ground-truthed metadata. - View Dependent Claims (106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130)
-
131. An apparatus for analyzing a document image, comprising:
a data processing system including a user input device, a display, one of memory, or access to memory, storing a document image, and resources for processing the document image, the resources including logic to;
access a database of representative image objects with linked metadata fields storing metadata, the metadata including searchable characteristics of image objects matching the representative image objects;
segment the document image to identify a set of image objects within the document image;
process the set to match image objects in the set with representative image objects in the database, and to link matching image objects in the set with particular representative image objects in the database; and
display instances of image objects in the set that are linked with a particular representative image object in the database, and accept user input to interactively undo the link of selected image objects with the particular representative image object. - View Dependent Claims (132, 133, 134, 135, 136, 137, 138)
Specification