Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
First Claim
1. A method for generating a structured text from an unstructured text, the method comprising acts of:
- segmenting the unstructured text into text sections;
assigning, to at least one text section, a topic being indicative of content of the at least one text section, wherein the act of segmenting the unstructured text and/or the act of assigning a topic to the at least one text section uses at least one statistical model built from annotated training data;
providing to a user a first structured text comprising the at least one text section and a section heading for the at least one text section, the section heading corresponding to the topic assigned to the at least one text section;
receiving user input indicating at least one modification to the first structured text;
using a computer system to process the at least one modification received from the user to generate a second structured text; and
logging and analyzing the at least one modification received from the user to adapt the at least one statistical model.
5 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user'"'"'s preferences and for further training of the statistical models.
-
Citations
42 Claims
-
1. A method for generating a structured text from an unstructured text, the method comprising acts of:
-
segmenting the unstructured text into text sections; assigning, to at least one text section, a topic being indicative of content of the at least one text section, wherein the act of segmenting the unstructured text and/or the act of assigning a topic to the at least one text section uses at least one statistical model built from annotated training data; providing to a user a first structured text comprising the at least one text section and a section heading for the at least one text section, the section heading corresponding to the topic assigned to the at least one text section; receiving user input indicating at least one modification to the first structured text; using a computer system to process the at least one modification received from the user to generate a second structured text; and logging and analyzing the at least one modification received from the user to adapt the at least one statistical model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for generating a structured text from an unstructured text, the apparatus comprising a computer system configured to:
-
segment the unstructured text into text sections; assign, to at least one text section, a topic being indicative of content of the at least one text section, the topic being associated with a plurality of section headings, wherein, in segmenting the unstructured text and/or assigning a topic to the at least one text section, the computer system is configured to use at least one statistical model built from annotated training data; provide to a user a first structured text comprising the at least one text section and a section heading for the at least one text section, the section heading being selected from the plurality of section headings associated with the topic assigned to the at least one text section; receive user input indicating at least one modification to the first structured text; process the at least one modification received from the user to generate a second structured text; and log and analyze the at least one modification received from the user to adapt the at least one statistical model. - View Dependent Claims (11, 12, 13, 14)
-
-
15. At least one article of manufacture, comprising:
at least one computer-readable storage device having encoded thereon executable instructions that, when executed by a computer system, perform a method for generating a structured text from an unstructured text, the method comprising acts of; segmenting the unstructured text into text sections; assigning, to at least one text section, a topic being indicative of content of the at least one text section, the topic being associated with a plurality of section headings, wherein the act of segmenting the unstructured text and/or the act of assigning a topic to the at least one text section uses at least one statistical model built from annotated training data; providing to a user a first structured text comprising the at least one text section and a section heading for the at least one text section, the section heading being selected from the plurality of section headings associated with the topic assigned to the at least one text section; receiving user input indicating at least one modification to the first structured text; processing the at least one modification received from the user to generate a second structured text; and logging and analyzing the at least one modification received from the user to adapt the at least one statistical model. - View Dependent Claims (16, 17, 18, 19, 20)
-
21. A system comprising:
-
means for providing to a user a first structured text comprising at least one text section and a section heading for the at least one text section, the at least one text section being one of a plurality of text sections obtained from segmenting an unstructured text, the section heading being selected from a plurality of section headings associated with a topic assigned to the at least one text section, the topic being indicative of content of the at least one text section, wherein segmenting the unstructured text and/or assigning the topic to the at least one text section uses at least one statistical model built from annotated training data; means for receiving user input indicating at least one modification to the first structured text; means for processing the at least one modification received from the user to generate a second structured text; and means for logging and analyzing the at least one modification received from the user to adapt the at least one statistical model. - View Dependent Claims (22, 23, 24)
-
-
25. A method for generating a structured text from an unstructured text, the method comprising acts of:
-
segmenting the unstructured text into text sections; assigning, to at least one text section, a topic being indicative of content of the at least one text section; identifying a text portion as being a full or partial verbalization of a section heading for the at least one text section, the section heading corresponding to the topic assigned to the at least one text section; providing to a user a first structured text comprising the at least one text section and the section heading for the at least one text section, wherein the text portion identified as being a full or partial verbalization of the section heading is removed from the first structured text; receiving user input indicating at least one modification to the first structured text; and using a computer system to process the at least one modification received from the user to generate a second structured text. - View Dependent Claims (26, 27, 28, 29, 30)
-
-
31. An apparatus for generating a structured text from an unstructured text, the apparatus comprising a computer system configured to:
-
segment the unstructured text into text sections; assign, to at least one text section, a topic being indicative of content of the at least one text section, the topic being associated with a plurality of section headings; identify a text portion as being a full or partial verbalization of a section heading for the at least one text section, the section heading being selected from the plurality of section headings associated with the topic assigned to the at least one text section; provide to a user a first structured text comprising the at least one text section and the section heading for the at least one text section, wherein the text portion identified as being a full or partial verbalization of the section heading is removed from the first structured text; receive user input indicating at least one modification to the first structured text; and process the at least one modification received from the user to generate a second structured text. - View Dependent Claims (32, 33, 34, 35, 36)
-
-
37. At least one article of manufacture, comprising:
at least one computer-readable storage device having encoded thereon executable instructions that, when executed by a computer system, perform a method for generating a structured text from an unstructured text, the method comprising acts of; segmenting the unstructured text into text sections; assigning, to at least one text section, a topic being indicative of content of the at least one text section, the topic being associated with a plurality of section headings; identifying a text portion as being a full or partial verbalization of a section heading for the at least one text section, the section heading being selected from the plurality of section headings associated with the topic assigned to the at least one text section; providing to a user a first structured text comprising the at least one text section and the section heading for the at least one text section, wherein the text portion identified as being a full or partial verbalization of the section heading is removed from the first structured text; receiving user input indicating at least one modification to the first structured text; and processing the at least one modification received from the user to generate a second structured text. - View Dependent Claims (38, 39, 40, 41, 42)
Specification