Method and apparatus for robust efficient parsing
First Claim
1. A method of parsing text to form a representation of the text, the representation having structures that span sub-strings of words in the text, each structure having a token at its root, the method comprising:
- identifying a first structure that spans a first sub-string of words in the text and has a first token as its root, the first sub-string having a starting position and an ending position;
indexing the first structure by the first token and the starting position and ending position of the first sub-string;
identifying a second structure that spans the first sub-string of words and has the first token as its root;
using the first token and the starting position and ending position of the first sub-string to locate the first structure; and
removing one of the first structure and second structure from further consideration in the formation of the representation of the text.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method for improving the efficiency of parsing text. Aspects of the invention include representing parse tokens as integers where a portion of the integer indicates the location in which a definition for the token can be found. In a further aspect, an integer representing a token points to an array of tokens that can be activated by the token. In another aspect, a list of pointers to partial parses is created before attempting to parse a next word in the text string. The list of pointers includes pointers to partial parses that are expecting particular semantic tokens. A fourth aspect of the invention utilizes a data structure to list the semantic tokens that have been fully parsed for each span in the input text segment. When a token is fully parsed, the list is accessed to determine if the new token should be discarded.
-
Citations
5 Claims
-
1. A method of parsing text to form a representation of the text, the representation having structures that span sub-strings of words in the text, each structure having a token at its root, the method comprising:
-
identifying a first structure that spans a first sub-string of words in the text and has a first token as its root, the first sub-string having a starting position and an ending position; indexing the first structure by the first token and the starting position and ending position of the first sub-string; identifying a second structure that spans the first sub-string of words and has the first token as its root; using the first token and the starting position and ending position of the first sub-string to locate the first structure; and removing one of the first structure and second structure from further consideration in the formation of the representation of the text. - View Dependent Claims (2, 3, 4, 5)
-
Specification