Segmenting and interpreting a document, and relocating document fragments to corresponding sections
First Claim
1. A method, comprising:
- receiving, by a processor, a document having multiple sections of different section types;
obtaining, by the processor, multiple lexicons;
interpreting, by the processor, multiple fragments in a first section of the multiple sections using one or more of the lexicons;
determining, by the processor, a section type for each fragment of the multiple fragments in the first section;
determining, by the processor, a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document;
determining, by the processor, that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and
based on exceeding the predetermined quantity, re-locating the second fragments to the second section in the document or reclassifying the second fragments to correspond to the first section type.
1 Assignment
0 Petitions
Accused Products
Abstract
A method comprising receiving a document having multiple sections of different types using a processor. The method also comprises obtaining a plurality of lexicons using the processor, each of the lexicons for interpreting fragments in one or more of the section types. The method further comprises interpreting fragments in a first section of the multiple sections using the processor and one or more lexicons. The method still further comprises determining, based upon the interpretation and using the processor, that a fragment in the first section is misplaced. The method still further comprises re-locating, using the processor, the misplaced fragment to a second section of the multiple sections in the document to generate a re-organized document. The method additionally includes storing the re-organized document in a hardware storage system using the processor.
-
Citations
17 Claims
-
1. A method, comprising:
-
receiving, by a processor, a document having multiple sections of different section types; obtaining, by the processor, multiple lexicons; interpreting, by the processor, multiple fragments in a first section of the multiple sections using one or more of the lexicons; determining, by the processor, a section type for each fragment of the multiple fragments in the first section; determining, by the processor, a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document; determining, by the processor, that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and based on exceeding the predetermined quantity, re-locating the second fragments to the second section in the document or reclassifying the second fragments to correspond to the first section type. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
receiving a first item and a second item; determining that the first item is a fragment matching a lexicon; placing the fragment in a first section of a document, the first section selected based on the lexicon; segmenting the document into multiple sections, wherein each of the multiple sections corresponds to a respective section type of multiple section types; segmenting items in a first section of multiple sections of the document into multiple fragments, wherein the first section corresponds to a first section type; determining a section type of each of the multiple fragments in the first section; determining whether the multiple fragments include fragments that correspond to different section types and that are interspersed among each other approximately even proportions; and based on the multiple fragments in the first section including fragments that correspond to different section types and that are interspersed among each other in even proportions; determining that the fragments that correspond to different section types and that are interspersed among each other in approximately even proportions do not belong in the first section; generating a new section corresponding to a section type that corresponds to a section type that is different than the multiple section types; and re-locating the fragments that correspond to different section types and that are interspersed among each other in approximately even proportions to the new section. - View Dependent Claims (9, 10, 11)
-
-
12. A method, comprising:
-
receiving a document having section headers; segmenting the document into at least first and second sections based on the section headers; segmenting items in the first section into fragments including a first fragment and a second fragment; identifying a section type for each of the fragments using multiple section type-specific lexicons that include a first section type-specific lexicon that corresponds to a section type of the first section and a second section type-specific lexicon that corresponds to a section type of the second section, wherein the first fragment is identified as corresponding to a different section type than the second fragment; determining that the identified section type for at least one of the fragments corresponds to the section type of the second section; determining a first quantity of first fragments of multiple fragments an a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document; determining that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and based on exceeding the predetermined quantity, relocating the at least one of the fragments to the second section based on determining that the identified section type for the at least one of the fragments corresponds to the section type of the second section. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification