Parsing with controlled tokenization
First Claim
1. A computer readable medium comprising software for instructing a computer to:
- a) provide a parser process having a plurality of states, each state implementing a rule in a defined grammar; and
b) provide a tokenizer process having a plurality of sub-tokenizer processes corresponding to the plurality of states in the parser process, each sub-tokenizer process adapted to tokenize a portion of an input string to generate a token corresponding to the rule implemented by one of the plurality of states in the parser process, wherein the parser process selects one of the plurality of sub-tokenizer processes to tokenize the portion of the input string based on a current one of the plurality of states.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a parsing technique wherein a parsing process provides feedback to a tokenizer to select an appropriate sub-tokenizer process corresponding to a grammar rule being implemented by the current parsing state. Each parsing state will select a corresponding sub-tokenizer process to tokenize a corresponding portion of an input stream for a message to be parsed. Each sub-tokenizer process is preferably unique and configured to provide only tokens capable of being processed by the grammar rule being implemented in the corresponding parser state. If the input string cannot be tokenized as required by the corresponding grammar rule implemented by the parser state, an error message is delivered. The parser process will move from one state to another, based on processing the respective tokens, until the input stream for the message is completely parsed.
29 Citations
23 Claims
-
1. A computer readable medium comprising software for instructing a computer to:
-
a) provide a parser process having a plurality of states, each state implementing a rule in a defined grammar; and b) provide a tokenizer process having a plurality of sub-tokenizer processes corresponding to the plurality of states in the parser process, each sub-tokenizer process adapted to tokenize a portion of an input string to generate a token corresponding to the rule implemented by one of the plurality of states in the parser process, wherein the parser process selects one of the plurality of sub-tokenizer processes to tokenize the portion of the input string based on a current one of the plurality of states. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for parsing comprising:
-
a) an interface for receiving an input string; and b) a control system associated with the interface and adapted to provide; i) a parser process having a plurality of states, each state implementing a rule in a defined grammar; and ii) a tokenizer process having a plurality of sub-tokenizer processes corresponding to the plurality of states in the parser process, each sub-tokenizer process adapted to tokenize the portion of the input string to generate a token corresponding to the rule implemented by one of the plurality of states in the parser process, wherein the parser process selects one of the plurality of sub-tokenizer processes to tokenize a portion of the input string based on a current one of the plurality of the states. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method for parsing comprising:
-
a) providing a parser process having a plurality of states, each state implementing a rule in a defined grammar; b) providing a tokenizer process having a plurality of sub-tokenizer processes corresponding to the plurality of states in the parser process, each sub-tokenizer process adapted to tokenize a portion of an input string to generate a token corresponding to the rule implemented by one of the plurality of states in the parser process, wherein the parser process selects one of the plurality of sub-tokenizer processes to tokenize a portion of the input string based on a current one of the plurality of states. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A method for parsing comprising:
-
a) selecting one of a plurality of sub-tokenizing processes to tokenize a portion of an input string based on a current parsing state; b) tokenizing the portion of the input string to generate a token corresponding to a grammar rule implemented by the current parsing state; c) processing the token according to the grammar rule of the current parsing state with the selected sub-tokenizing process; d) moving to a subsequent parsing state based on processing the token according to the grammar rule of the current parsing state; and e) repeating steps a through d until the input string is parsed. - View Dependent Claims (20, 21, 22, 23)
-
Specification