Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
First Claim
1. A system comprising at least one processor programmed to:
- segment an unstructured text into a plurality of text sections;
identify a portion of text that fully or partially identifies a section heading for a first text section of the plurality of text sections;
remove, from the first text section, the portion of text that fully or partially identifies the section heading;
create a structured text comprising the first text section and the section heading for the first text section, wherein the portion of text that fully or partially identifies the section heading has been removed from the first text section; and
provide the structured text to a user.
5 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.
144 Citations
20 Claims
-
1. A system comprising at least one processor programmed to:
-
segment an unstructured text into a plurality of text sections; identify a portion of text that fully or partially identifies a section heading for a first text section of the plurality of text sections; remove, from the first text section, the portion of text that fully or partially identifies the section heading; create a structured text comprising the first text section and the section heading for the first text section, wherein the portion of text that fully or partially identifies the section heading has been removed from the first text section; and provide the structured text to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising acts of:
-
segmenting an unstructured text into a plurality of text sections; using at least one processor to identify a portion of text that fully or partially identifies a section heading for a first text section of the plurality of text sections; removing, from the first text section, the portion of text that fully or partially identifies the section heading; creating a structured text comprising the first text section and the section heading for the first text section, wherein the portion of text that fully or partially identifies the section heading has been removed from the first text section; and providing the structured text to a user. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. At least one non-transitory computer-readable medium having stored thereon instructions that, when executed by at least one processor, perform a method comprising acts of:
-
segmenting an unstructured text into a plurality of text sections; using at least one processor to identify a portion of text that fully or partially identifies a section heading for a first text section of the plurality of text sections; removing, from the first text section, the portion of text that fully or partially identifies the section heading; creating a structured text comprising the first text section and the section heading for the first text section, wherein the portion of text that fully or partially identifies the section heading has been removed from the first text section; and providing the structured text to a user. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification