Apparatus and method for parallel regular expression matching
First Claim
1. A method of matching a stream of characters against a predetermined regular expression, implemented using computational hardware comprising at least two hardware engines, the at least two hardware engines comprising a lookup engine and a regex engine, the method comprising:
- obtaining a transition table representing the regular expression as a graph comprising one or more input-conditional state transition specifications not limited to single-character inputs;
generating a lookup table based on the transition table, the lookup table containing exactly one entry for each state of a state machine, each entry specifying a number n of characters to obtain from the character stream, wherein n is a non-negative integer not limited to 0 or 1;
initializing the state machine;
executing the lookup engine operative to, at each state of the state machine, retrieve the n characters specified in the lookup table for that state, and provide the n characters to the regex engine; and
executing the regex engine operative to, at each state of the state machine, perform one of;
calculating a next state of the state machine based on the current state, the n characters received from the lookup engine, and the graph of state transition specifications; and
terminating the method if all characters received from the lookup engine fail to match input conditions for all active state transition specifications or if a match succeeds.
1 Assignment
0 Petitions
Accused Products
Abstract
A regular expression matching hardware implementation comprises two tightly coupled hardware engines. A regex engine performs state transitions and accepts (as matching) or rejects (as not matching) an input string. The regex engine takes also care of the logic of the operators and deals with the complexity of the state machine. A lookup engine reads characters from an input (e.g., tape, memory, network packets, or the like), and provides them to the regex engine. A preprocessing procedure transforms a regular expression into a regex state transition table and a lookup table, for use by the regex engine and lookup engine, respectively. The two hardware engines are synchronized by a global state machine. The regex engine advances the state machine, and the lookup engine reads it.
20 Citations
13 Claims
-
1. A method of matching a stream of characters against a predetermined regular expression, implemented using computational hardware comprising at least two hardware engines, the at least two hardware engines comprising a lookup engine and a regex engine, the method comprising:
-
obtaining a transition table representing the regular expression as a graph comprising one or more input-conditional state transition specifications not limited to single-character inputs; generating a lookup table based on the transition table, the lookup table containing exactly one entry for each state of a state machine, each entry specifying a number n of characters to obtain from the character stream, wherein n is a non-negative integer not limited to 0 or 1; initializing the state machine; executing the lookup engine operative to, at each state of the state machine, retrieve the n characters specified in the lookup table for that state, and provide the n characters to the regex engine; and executing the regex engine operative to, at each state of the state machine, perform one of; calculating a next state of the state machine based on the current state, the n characters received from the lookup engine, and the graph of state transition specifications; and terminating the method if all characters received from the lookup engine fail to match input conditions for all active state transition specifications or if a match succeeds. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A regular expression matching apparatus, comprising:
-
an input operative to provide characters to be matched to a regular expression in groups of zero, one, and multiple characters; memory operative to store; a first lookup table containing exactly one entry for each state of a state machine, each entry specifying a number n of characters to retrieve from the input, wherein n is a non-negative integer not limited to 0 or 1; and a first transition table representing a first regular expression as a graph comprising a plurality of input-conditional state transition specifications for the first state machine, wherein the input-conditional state transition specifications are not limited to zero- or single-character inputs; computational hardware comprising at least two hardware engines, the at least two hardware engines comprising; a lookup engine operative to, at each step of the first state machine, retrieve the n characters specified in the first lookup table from the input and provide the n characters to a first regex engine; and a first regex engine operative to, at each state of the first state machine, perform one of; calculating a next state of the first state machine based on its current state, any characters received from the lookup engine, and the graph of input-conditional state transition specifications; and terminating the matching process if all characters received from the lookup engine fail to match input conditions for all active state transition specifications or if a match succeeds. - View Dependent Claims (10, 11, 12, 13)
-
Specification