Processing natural language text using autonomous punctuational structure
First Claim
1. A method comprising steps of:
- storing first text data representing a first natural language text that includes a first set of words and a first set of punctuational features with positions relative to the first set of words;
the first text data including first structure data indicating types of a first set of text units at the boundaries of which the first set of punctuational features are positioned, the first structure data further indicating nesting relationships between the first set of words and the first set of text units such that the first set of punctuational features and their positions relative to the first set of words can be determined from the first structure data; and
operating on the first text data to produce second text data representing a second natural language text that is different from the first natural language text;
the second natural language text including a second set of words and a second set of punctuational features with positions relative to the second set of words;
the second text data including second structure data indicating types of a second set of text units at the boundaries of which the second set of punctuational features are positioned, the second structure data further indicating nesting relationships between the second set of words and the second set of text units such that the second set of punctuational features and their positions relative to the second set of words can be determined from the second structure data.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique for processing natural language text uses a data structure that includes structure data in the text data. The structure data indicates an autonomous punctuational structure of the text, a punctuational structure that is independent of the lexical content of the text and therefore can be manipulated without considering the meaning of the words in the text. The data structure can be a tree in which each node has a textual type such as a paragraph, sentence, clause, phrase, or word. The data structure could alternatively be parallel data sequences, one with codes indicating the text'"'"'s characters and the other with codes indicating textual types. The data structure is produced and maintained using a grammar of textual types, indicating for each textual type the textual types of units into which it can properly be divided. During editing, a text sequence is generated by applying rendering rules to the data structure, and the text is presented to the user based on the text sequence. Prior to generating the text sequence, information relating to punctuational features is propagated through the data structure. User signals requesting editing operations are applied to modify the data structure using operations rules, and the user'"'"'s pointing or selecting signals are mapped onto the data structure. The modified data structure is checked with the grammar of textual types to ensure that it has an autonomous punctuational structure. A modified text sequence is then generated, and a modified text is displayed based on it.
-
Citations
31 Claims
-
1. A method comprising steps of:
-
storing first text data representing a first natural language text that includes a first set of words and a first set of punctuational features with positions relative to the first set of words;
the first text data including first structure data indicating types of a first set of text units at the boundaries of which the first set of punctuational features are positioned, the first structure data further indicating nesting relationships between the first set of words and the first set of text units such that the first set of punctuational features and their positions relative to the first set of words can be determined from the first structure data; andoperating on the first text data to produce second text data representing a second natural language text that is different from the first natural language text;
the second natural language text including a second set of words and a second set of punctuational features with positions relative to the second set of words;
the second text data including second structure data indicating types of a second set of text units at the boundaries of which the second set of punctuational features are positioned, the second structure data further indicating nesting relationships between the second set of words and the second set of text units such that the second set of punctuational features and their positions relative to the second set of words can be determined from the second structure data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising steps of:
-
storing text data representing a natural language text that includes words and punctuational features with positions relative to the words;
the text data including structure data indicating types of text units at the boundaries of which the punctuational features are positioned, the structure data further indicating nesting relationships between the words and the text units such that the punctuational features and their positions relative to the words can be determined from the structure data; andoperating on the text data to produce codes indicating the words and the punctuational features in the natural language text;
the step of operating on the text data comprising a substep of determining the punctuational features and their positions relative to the words from the structure data;
the codes produced by the step of operating on the text data being in a sequence such that the codes indicate that the punctuational features are in their positions relative to the words as determined from the structure data. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method comprising steps of:
-
obtaining codes representing a natural language text that includes words and punctuational features, the codes being in a sequence such that the codes indicate that the punctuational features are in positions relative to the words; and operating on the codes to produce text data representing the natural language text;
the text data including structure data indicating types of text units at the boundaries of which the punctuational features are positioned, the structure data further indicating nesting relationships between the words and the text units such that the punctuational features and their positions relative to the words as indicated by the codes can be determined from the structure data. - View Dependent Claims (16, 17, 18)
-
-
19. A data structure produced for use in a system that includes:
-
memory for storing the data sturcture; and a processor connected for accessing the data structure when stored in the memory; the data structure comprising text data representing a natural language text that includes words and punctuational features with positions relative to the words;
the text data comprising structure data indicating types of text units at the boundaries of which the punctuational features are positioned, the structure data further indicating nesting relationships between the words and the text units such that the processor can access the text data and use the structure data to determine the punctuational features and their positions relative to the words when the data structure is stored in the memory. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A system comprising:
-
memory for storing text data representing a natural language text that includes words and punctuational features with positions relative to the words;
the text data including structure data indicating types of text units at the boundaries of which the punctuational features are positioned, the structure data further indicating nesting relationships between the words and the text units such that the structure data can be used to detrermine the punctuational features and their positions relative to the words; anda processor connected for accessing the text data when stored in the memory, the processor comprising means for using the structure data to determine the punctuational features and their positions relative to words. - View Dependent Claims (25)
-
- 26. The system of lcaim 24 in which the processor further comprises means for regenerating the natural language text from the text data by producing a sequence of codes including punctuation mark codes indicating the punctuational features as determined by the means for using the structure data.
-
29. The system of clim 28 in which the signals further include operation data indicating operation to be performed on the selected part of the regenerated natural language text;
- the processor further comprising means for performing the indicated operation by modifying the text data by modifying the structure data.
- View Dependent Claims (30, 31)
Specification