BLOCKWISE EXTRACTION OF DOCUMENT METADATA
First Claim
1. A method comprising:
- obtaining a document image, wherein the document image includes a plurality of objects;
identifying a plurality of macroblocks within the document image;
performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and
outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, computer program products, and systems are presented. The methods include, for instance: obtaining a document image, wherein the document image includes a plurality of objects; identifying a plurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks.
-
Citations
25 Claims
-
1. A method comprising:
-
obtaining a document image, wherein the document image includes a plurality of objects; identifying a plurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
obtaining a document image, wherein the document image includes a plurality of objects; identifying a macroblock within the document image, wherein the macroblock includes objects of the plurality of objects; examining content of microblocks within an area of the macroblock of the document image for extraction of one or more key-value pair, wherein the examining includes examining content of unaligned microblocks within the area of the microblock, and wherein the examining content of unaligned microblocks within the area of the microblock includes applying an ontological analysis; associating a confidence level to a key-value pair of the one or more key-value pair; and outputting the one or more key-value pair. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
obtaining a document image, wherein the document image includes a plurality of objects; processing the document image to identify a baseline styling parameter value, the baseline styling parameter value specifying a baseline font height; identifying for each word of a line of text of the document image a relative styling parameter, the relative styling parameter being defined in reference to the baseline styling parameter value, wherein the relative styling parameter specifies a font height of a word of text of the text line as a percentage value the baseline styling parameter value; and providing the relative styling parameter as output metadata for output. - View Dependent Claims (17, 18, 19)
-
-
20. A computer program product comprising:
-
a computer readable storage medium readable by one or more processing circuit and storing instructions for execution by one or more processor for performing a method comprising; obtaining a document image, wherein the document image includes a plurality of objects; identifying a plurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks. - View Dependent Claims (21, 22, 23, 24)
-
-
25. A system comprising:
-
a memory; at least one processor in communication with memory; and program instructions executable by one or more processor via the memory to perform a method comprising; obtaining a document image, wherein the document image includes a plurality of objects; identifying a plurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks.
-
Specification