Segmenting and interpreting a document, and relocating document fragments to corresponding sections
First Claim
1. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
- receive a document having section headers;
segment the document into at least first and second sections based on the section headers;
segment items in the first section into fragments including a first fragment and a second fragment;
identify a section type for each of the fragments using multiple section type-specific lexicons that include a first section type-specific lexicon that corresponds to a section type of the first section and a second section type-specific lexicon that corresponds to a section type of the second section, wherein the first fragment is identified as corresponding to a different section type than the second fragment;
determine a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document;
determine that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and
based on exceeding the predetermined quantity, re-locate the second fragments to the second section in the document or reclassify the second fragments to correspond to the first section type.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer program product comprising a computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to receive a document having multiple section headers, segment the document into at least first and second sections based on the section headers, segment items in the first section into fragments and identify a section type for each of the fragments, determine that the identified section type for at least one of the fragments better matches a type of the second section than it matches a type of the first section, and re-locate the at least one of the fragments to the second section.
-
Citations
12 Claims
-
1. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
receive a document having section headers; segment the document into at least first and second sections based on the section headers; segment items in the first section into fragments including a first fragment and a second fragment; identify a section type for each of the fragments using multiple section type-specific lexicons that include a first section type-specific lexicon that corresponds to a section type of the first section and a second section type-specific lexicon that corresponds to a section type of the second section, wherein the first fragment is identified as corresponding to a different section type than the second fragment; determine a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document; determine that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and based on exceeding the predetermined quantity, re-locate the second fragments to the second section in the document or reclassify the second fragments to correspond to the first section type. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
segment items in a first section of a document into multiple fragments;
determine a section type of each of the multiple fragments;determine a first quantity of first fragments of the multiple fragments and a second quantity of second fragments of the multiple fragments, wherein the first fragments correspond to a first section type of the first section and the second fragments correspond to a second section type of a second section of the document; determine that the first quantity of the first fragments exceeds the second quantity of the second fragments by a predetermined quantity; and based on exceeding the predetermined quantity, re-locate the second fragments to the second section in the document or reclassify the second fragments to correspond to the first section type. - View Dependent Claims (8, 9, 10)
-
-
11. A computer program product comprising a non-transitory computer-readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
segment the document into multiple sections, wherein each of the multiple sections corresponds to a respective section type of multiple section types; segment items in a first section of the multiple sections into multiple fragments, wherein the first section corresponds to a first section type of the multiple section types; determine a section type of each of the multiple fragments in the first section; determine whether the multiple fragments include fragments that correspond to different section types and that are interspersed among each other in even proportions; and based on the multiple fragments in the first section including fragments that correspond to different section types and that are interspersed among each other in even proportions; determine that the fragments that correspond to different section types and that are interspersed among each other in even proportions do not belong in the first section; generate a new section corresponding to a section type that corresponds to a section type that is different than the multiple section types; and re-locate the fragments that correspond to different section types and that are interspersed among each other in even proportions to the new section. - View Dependent Claims (12)
-
Specification