System and method for document section segmentation
First Claim
Patent Images
1. An automated computer implemented method for categorizing document section headings comprising the steps of:
- determining a set of canonical section headings from a set of documents;
establishing a data set containing said canonical section headings and information associating at least one of the canonical section headings with at least one other section heading that is different from the at least one of the canonical section headings but corresponds to the at least one of the canonical section headings;
extracting at least one section heading from another document;
transforming the extracted section heading into a plurality of n-grams, and;
associating the extracted section heading with a particular one of the canonical section headings in the data set if said plurality of n-grams have a predetermined level of similarity to said particular one of the canonical section headings.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for facilitating the processing and the use of documents by providing a system for categorizing document section headings under a set of canonical section headings. In the method for categorizing section headings, there may be a process of training a database and matching methods to categorize different but equivalent document section headings under canonical headings and categories. Once trained, the system may match and categorize the document sections with little to no supervision of the categorization for large sets of documents.
-
Citations
15 Claims
-
1. An automated computer implemented method for categorizing document section headings comprising the steps of:
-
determining a set of canonical section headings from a set of documents; establishing a data set containing said canonical section headings and information associating at least one of the canonical section headings with at least one other section heading that is different from the at least one of the canonical section headings but corresponds to the at least one of the canonical section headings; extracting at least one section heading from another document; transforming the extracted section heading into a plurality of n-grams, and; associating the extracted section heading with a particular one of the canonical section headings in the data set if said plurality of n-grams have a predetermined level of similarity to said particular one of the canonical section headings. - View Dependent Claims (2, 3, 4, 5)
-
-
6. At least one computer readable storage medium encoded with instructions that, when executed, perform a method for categorizing document section headings comprising acts of:
-
determining a set of canonical section headings from a set of documents; establishing a data set containing said canonical section headings and information associating at least one of the canonical section headings with at least one other section heading that is different from the at least one of the canonical section headings but corresponds to the at least one of the canonical section headings; extracting at least one section heading from another document; transforming the extracted section heading into a plurality of n-grams, and; associating the extracted section heading with a particular one of the canonical section headings in the data set if said plurality of n-grams have a predetermined level of similarity to said particular one of the canonical section headings. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A system comprising:
-
at least one processor programmed to; determine a set of canonical section headings from a set of documents; establish a data set containing said canonical section headings and information associating at least one of the canonical section headings with at least one other section heading that is different from the at least one of the canonical section headings but corresponds to the at least one of the canonical section headings; extract at least one section heading from another document; transform the extracted section heading into a plurality of n-grams, and; associate the extracted section heading with a particular one of the canonical section headings in the data set if said plurality of n-grams have a predetermined level of similarity to said particular one of the canonical section headings. - View Dependent Claims (12, 13, 14, 15)
-
Specification