UNSTRUCTURED AND SEMISTRUCTURED DOCUMENT PROCESSING AND SEARCHING
First Claim
1. A method for analyzing and indexing an unstructured or semistructured document, comprising:
- receiving an unstructured or semistructured document;
converting the document to one or more text streams;
analyzing the one or more text streams for identifying textual contents of the document;
analyzing the one or more text streams for identifying logical sections of the document;
associating the textual contents with the logical sections;
indexing the textual contents and their association with the logical sections; and
saving a result of the indexing in a data storage device.
14 Assignments
0 Petitions
Accused Products
Abstract
A method for analyzing and indexing an unstructured or semistructured document according to one embodiment includes receiving an unstructured or semistructured document; converting the document to one or more text streams; analyzing the one or more text streams for identifying textual contents of the document; analyzing the one or more text streams for identifying logical sections of the document; associating the textual contents with the logical sections; indexing the textual contents and their association with the logical sections; and saving a result of the indexing in a data storage device.
116 Citations
22 Claims
-
1. A method for analyzing and indexing an unstructured or semistructured document, comprising:
-
receiving an unstructured or semistructured document; converting the document to one or more text streams; analyzing the one or more text streams for identifying textual contents of the document; analyzing the one or more text streams for identifying logical sections of the document; associating the textual contents with the logical sections; indexing the textual contents and their association with the logical sections; and saving a result of the indexing in a data storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for analyzing an unstructured or semistructured document, comprising:
-
receiving an unstructured or semistructured document; converting the document to one or more text streams; analyzing the one or more text streams for identifying paragraphs of the document; grouping the paragraphs into sections; and outputting the sections, or derivative thereof, to at least one of a user, another system, and another process. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method for analyzing and indexing an unstructured or semistructured document, comprising:
-
receiving an unstructured or semistructured document; converting the document to one or more text streams; analyzing the one or more text streams for identifying textual contents of the document; identifying logical sections of the document; associating the textual contents with the sections; analyzing the one or more text streams for identifying context information about each section; indexing the textual contents, the context information, and the association of the textual contents and context information with the logical sections; and saving a result of the indexing in a data storage device.
-
Specification