DOCUMENT LAYOUT EXTRACTION
First Claim
1. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed, perform a method for extracting information from a document in an electronic format to produce a representation containing structure and layout metadata, the method comprising:
- receiving one or more textual data in the electronic format;
converting the textual data from the electronic format to an independent interface format, the independent interface format including coordinates to one or more structural elements of the textual data;
performing a structure and layout analysis of the textual data to generate a set of structure and layout information; and
storing the textual data and the set of structure and layout information in an enriched interface format, the enriched interface format providing for search and navigation of the textual data.
2 Assignments
0 Petitions
Accused Products
Abstract
Computer-readable media, systems, and methods for document layout extraction are described. In embodiments, textual data in an electronic format is received and the textual data is converted from the electronic format to an independent interface format, the independent interface format including coordinates to one or more structural elements of the textual data. Further, in embodiments, a structure and layout analysis of the textual data is performed to generate a set of structure and layout information. Still further, in embodiments, the textual data and the set of structure and layout information is stored in an enriched interface format, the enriched interface format providing for search and navigation of the textual data.
104 Citations
20 Claims
-
1. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed, perform a method for extracting information from a document in an electronic format to produce a representation containing structure and layout metadata, the method comprising:
-
receiving one or more textual data in the electronic format; converting the textual data from the electronic format to an independent interface format, the independent interface format including coordinates to one or more structural elements of the textual data; performing a structure and layout analysis of the textual data to generate a set of structure and layout information; and storing the textual data and the set of structure and layout information in an enriched interface format, the enriched interface format providing for search and navigation of the textual data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computerized system for extracting information from a document in an electronic format to produce a representation containing structure and layout metadata, the system comprising:
-
a receiving component configured to receive one or more textual data in the electronic format; a converting component configured to convert the textual data from the electronic format to an independent interface format, the independent interface format including coordinates to one or more structural elements of the textual data; a processing component configured to analyze the textual data to generate a set of structure and layout information; and a storing component configured to store the textual data and the set of structure and layout information in an enriched interface format, the enriched interface format providing for search and navigation of the textual data. - View Dependent Claims (10, 11, 12, 13)
-
-
14. One or more computer-readable media having computer-executable instructions embodied thereon that, when executed, perform a method for converting a document in an electronic format into a representation containing structure and layout metadata, the method comprising:
-
sending one or more textual data in the electronic format to a layout extraction engine, wherein the layout extraction engine is configured to convert the textual data from the electronic format to an independent interface format, the independent interface format including coordinates to one or more structural elements of the textual data, and wherein the layout extraction engine is configured to perform a structure and layout analysis of the textual data to generate a set of structure and layout information; and receiving the textual data and the set of structure and layout information in an enriched interface format, wherein the enriched interface format provides for search and navigation of the textual data. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification