Systems and methods for processing regular expressions
First Claim
1. A method of processing a regular expression, the method comprising:
- receiving the regular expression comprising a range asserting expression;
determining a first subexpression and a second subexpression of a range asserting expression;
storing information indicating a relationship between the first subexpression and the second subexpression as indicated in the range asserting expression;
generating a deterministic finite-state automata (DFA) corresponding to the regular expression, the generated DFA comprising terminal states corresponding to locations of each of the first and the second subexpressions in a received data stream;
in response to reaching a first terminal state indicating the location of the first subexpression;
applying the DFA to the received data stream in order to generate information regarding two or more subexpressions in the received data stream; and
processing information regarding a relationship between the two or more subexpressions and the information regarding the two or more subexpressions in the received data file stream in order to determine if the received data stream satisfies the regular expression by;
in response to reaching a second terminal state indicating the location of the second subexpression;
storing the location of the first subexpressionstoring the location of the second subexpression; and
evaluating the relationship information, wherein the evaluating further comprises determining if the relationship between the first and second subexpression is satisfied by the location of the first subexpression and the location of the second subexpression.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for reducing the size of a DFA associated with a regular expression separates the functions of locating subexpressions within the DFA and determining if the located subexpressions satisfy a regular expression. For example, the functions of (1) locating subexpressions in a range asserting expression and, (2) determining whether the subexpressions satisfy the range of the range asserting expression are partitioned. In one embodiment, a first component may locate the subexpressions in a data stream using one or more DFAs, while a second component determines if the located subexpressions satisfy the range. In this embodiment, because the DFAs are not configured to determine a relationship between subexpressions, such as a range between subexpressions, the size of the resultant DFA may be significantly reduced.
-
Citations
2 Claims
-
1. A method of processing a regular expression, the method comprising:
-
receiving the regular expression comprising a range asserting expression; determining a first subexpression and a second subexpression of a range asserting expression; storing information indicating a relationship between the first subexpression and the second subexpression as indicated in the range asserting expression; generating a deterministic finite-state automata (DFA) corresponding to the regular expression, the generated DFA comprising terminal states corresponding to locations of each of the first and the second subexpressions in a received data stream; in response to reaching a first terminal state indicating the location of the first subexpression; applying the DFA to the received data stream in order to generate information regarding two or more subexpressions in the received data stream; and processing information regarding a relationship between the two or more subexpressions and the information regarding the two or more subexpressions in the received data file stream in order to determine if the received data stream satisfies the regular expression by; in response to reaching a second terminal state indicating the location of the second subexpression; storing the location of the first subexpression storing the location of the second subexpression; and evaluating the relationship information, wherein the evaluating further comprises determining if the relationship between the first and second subexpression is satisfied by the location of the first subexpression and the location of the second subexpression. - View Dependent Claims (2)
-
Specification