Method and apparatus for robust efficient parsing
First Claim
1. A method of parsing text in a computing device to form a logical representation of the text, the logical representation having tokens representing other tokens and words of the text, the method comprising:
- forming tokens from the text;
selecting a token;
a processor identifying an integer that represents the selected token, wherein identifying an integer comprises identifying an integer that is an offset into a pointer array of cells, the offset identifying a cell comprising a pointer that points to an identifier array of cells, each cell in the identifier array providing a token identifier for a token that is activated by the selected token according to a parsing rule for the token where the selected token is a first child node in the parsing rule;
a processor utilizing the integer to identify;
at least one token that is activated by the selected token; and
a parsing rule that licenses the activation of the token by the selected token to form an activated token where the selected token is a first child node in the parsing rule;
a processor adding at least one activated token to a chart; and
a processor using the activated token to form the logical representation of the text.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a method for improving the efficiency of parsing text. Aspects of the invention include representing parse tokens as integers where a portion of the integer indicates the location in which a definition for the token can be found. In a further aspect, an integer representing a token points to an array of tokens that can be activated by the token. In another aspect, a list of pointers to partial parses is created before attempting to parse a next word in the text string. The list of pointers includes pointers to partial parses that are expecting particular semantic tokens.
54 Citations
8 Claims
-
1. A method of parsing text in a computing device to form a logical representation of the text, the logical representation having tokens representing other tokens and words of the text, the method comprising:
-
forming tokens from the text; selecting a token; a processor identifying an integer that represents the selected token, wherein identifying an integer comprises identifying an integer that is an offset into a pointer array of cells, the offset identifying a cell comprising a pointer that points to an identifier array of cells, each cell in the identifier array providing a token identifier for a token that is activated by the selected token according to a parsing rule for the token where the selected token is a first child node in the parsing rule; a processor utilizing the integer to identify; at least one token that is activated by the selected token; and a parsing rule that licenses the activation of the token by the selected token to form an activated token where the selected token is a first child node in the parsing rule; a processor adding at least one activated token to a chart; and a processor using the activated token to form the logical representation of the text. - View Dependent Claims (2, 3, 4)
-
-
5. A method of parsing text to form a parse tree of the text, the method comprising:
-
selecting a word from the text; a processor forming a partial parse for a token based on the selected word; a processor examining the partial parse to identify an item that is needed to extend the partial parse for the token, the partial parse for the token identifying the token and items that form the token, the items that form the token comprising the item needed to extend the partial parse for the token; a processor placing a pointer to the partial parse for the token in a table assigned to a next word in the text, such that the pointer may be located from the item that is needed to extend the partial parse for the token; a processor selecting the next word from the text; a processor creating the item based in part on the next word; a processor using the item to locate the pointer to the partial parse for the token that can be extended by the item; a processor using the pointer to locate the partial parse for the token; a processor extending the partial parse for the token based on the item to form the token in the parse tree of the text. - View Dependent Claims (6, 7, 8)
-
Specification