Method and Device for High Performance Regular Expression Pattern Matching
First Claim
1. In a device for matching an input string to a pattern via a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state and a plurality of possible next states, the input string comprising a plurality of input symbols, the improvement comprising:
- the device comprising at least two parallel pipeline stages;
a first one of the pipeline stages being configured to retrieve a plurality of transitions to a possible next state of the DFA from a pre-populated memory; and
a second one of the pipeline stages that is configured to choose, based at least in part upon the DFA'"'"'s current state, one of said retrieved transitions from which the integrated circuit will determine the next state of the DFA, wherein the second one of the pipeline stages is downstream from the first one of the pipeline stages.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is an improved architecture for regular expression pattern matching. Improvements to pattern matching deterministic finite automatons (DFAs) that are described by the inventors include a pipelining strategy that pushes state-dependent feedback to a final pipeline stage to thereby enhance parallelism and throughput, augmented state transitions that track whether a transition is indicative of a pattern match occurring thereby reducing the number of necessary states for the DFA, augmented state transition that track whether a transition is indicative of a restart to the matching process, compression of the DFA'"'"'s transition table, alphabet encoding for input symbols to equivalence class identifiers, the use of an indirection table to allow for optimized transition table memory, and enhanced scalability to facilitate the ability of the improved DFA to process multiple input symbols per cycle.
214 Citations
36 Claims
-
1. In a device for matching an input string to a pattern via a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state and a plurality of possible next states, the input string comprising a plurality of input symbols, the improvement comprising:
-
the device comprising at least two parallel pipeline stages; a first one of the pipeline stages being configured to retrieve a plurality of transitions to a possible next state of the DFA from a pre-populated memory; and a second one of the pipeline stages that is configured to choose, based at least in part upon the DFA'"'"'s current state, one of said retrieved transitions from which the integrated circuit will determine the next state of the DFA, wherein the second one of the pipeline stages is downstream from the first one of the pipeline stages. - View Dependent Claims (2, 3)
-
-
4. A method for matching an input string to a pattern in a device that realizes a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state and a plurality of possible next states, the input string comprising a plurality of input symbols, the device comprising at least two parallel pipeline stages, the method comprising:
-
within one of the pipeline stages, retrieving from a memory a plurality of stored transitions to a next possible state of the DFA; and within another of the pipeline stages that is downstream in the pipeline from the pipeline stage from which the plurality of transitions were retrieved, selecting one of the retrieved transitions to identify the next state of the DFA, wherein the selecting is based at least in part upon the current state of the DFA. - View Dependent Claims (5)
-
-
6. A device comprising:
a multi-stage pattern matching pipeline configured to receive an input data stream comprising a plurality of input symbols and process the received data stream, wherein the pipeline realizes a pattern matching deterministic finite automaton (DFA) that is configured to determine whether any string of input symbols within the data stream matches a pattern, the DFA having a plurality of states including a current state, the pipeline comprising a plurality of stages including a final stage, each stage having at least one stage input and at least one stage output, the final stage being configured to provide as outputs a next state for the DFA, wherein the only stage of the pipeline that receives a feedback loop based on a stage output is the final stage. - View Dependent Claims (7, 8, 9, 10, 11)
-
12. A device comprising:
a data processing pipeline configured to receive an input data stream comprising a plurality of input symbols and process the received data stream through a logic circuit to determine whether any input symbol string within the input data stream matches a pattern, the logic circuit being configured to realize a deterministic finite automaton (DFA), the DFA comprising a plurality of states, an alphabet of input symbols, an alphabet of equivalence class identifiers (ECIs), a set of functions that map groups of m input symbols to corresponding ECIs, and a set of functions that define transitions between states of the DFA, each state transition function being indexed by an ECI and comprising a next state identifier and a match flag, wherein m is an integer that is greater than or equal to one. - View Dependent Claims (13, 14, 15, 16)
-
17. A device for defining a deterministic finite automaton (DFA), the DFA comprising a plurality of states, the DFA being configured to detect whether a pattern exists within a stream of input symbols as it progresses through the plurality of states, the device comprising:
a processor configured to (1) receive data describing the pattern that is to be matched against the stream of input symbols, (2) process the received pattern data to determine the plurality of DFA states and determine a plurality of transitions defining how the DFA is to progress from a current state to a next state as each group of m input symbols is processed by the DFA, wherein m is an integer that is greater than or equal to one, wherein each transition corresponds to a group of m input symbols, wherein each transition comprises a next state identifier and a match flag, the match flag of each transition being indicative of whether the input symbol group corresponding to that transition caused a match to occur between the pattern and the stream of input symbols, and wherein the determined states and the determined transitions at least partially defining the DFA. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
30. A method of compiling a set of transitions for a deterministic finite automaton (DFA), the DFA comprising a plurality of states, the DFA being configured to detect whether a pattern exists within a stream of input symbols as it progresses through the plurality of states, the method comprising:
-
receiving data describing the pattern from the user; responsive to the received pattern data, defining a plurality of transitions, each transition corresponding a group of m input symbols, wherein m is an integer greater than or equal to 1, each transition comprising a next state identifier and a match flag, the match flag being indicative of whether the input symbol group corresponding thereto caused a match to occur between the stream and the pattern. - View Dependent Claims (31)
-
-
32. A method for creating a reduced state deterministic finite automaton (DFA) from a canonical DFA, each DFA being configured to match an input string to a pattern, the input string comprising a plurality of input symbols, each DFA comprising a plurality of states and a transition table, the transition table comprising a plurality of transitions that define how the DFA progresses from a current state to a next state in response to at least one new input symbol being received by the DFA, and wherein at least one of the states of the canonical DFA is an accept state, the method comprising:
-
eliminating each accept state from the canonical DFA to thereby create the reduced state DFA; and augmenting the transition table such that each transition includes a flag indicative of whether the at least one new input symbol received by the reduced state DFA has caused a match to occur between the input string and the pattern. - View Dependent Claims (33, 34)
-
-
35. A computer readable storage medium for determining how a plurality of state transitions are to be ordered in a transition table for a deterministic finite automaton (DFA) having a plurality of states, each transition being indexed by a DFA state, the computer readable storage medium comprising:
-
a code segment executable by a processor for reading the transition table to determine the plurality of transitions therein; and a code segment executable by a processor that is configured to perform a state re-ordering algorithm on the determined transitions to optimize the order of the states in the transition table; wherein the code segments are resident on the computer readable storage medium.
-
-
36. A computer readable storage medium for determining how a plurality of run-length coded state transitions are to be stored in a transition table memory, the transition table memory comprising a plurality of words, each word having a width sufficient to store a plurality of run-length coded state transitions, the computer readable storage medium comprising:
-
a code segment executable by a processor for reading the run-length coded state transitions; and a code segment executable by a processor that is configured to perform an optimal memory word packing algorithm on the read run-length coded state transitions to determine how the run-length coded state transitions are to be stored in the words of the transition table memory; wherein the code segments are resident on the computer readable storage medium.
-
Specification