Method and device for high performance regular expression pattern matching
First Claim
1. A device for matching an input string to a pattern via a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state, the input string comprising a plurality of input symbols, the device comprising:
- a pattern matching circuit that implements the DFA, the pattern matching circuit being configured to receive and serially process the input symbols of the input string in groups of m input symbols, wherein m is an integer that is greater than or equal to 1, the pattern matching circuit comprising;
(1) a transition table memory comprising a plurality of stored transitions, each stored transition being indexed by data corresponding to at least one input symbol, each transition comprising a next state identifier, (2) transition retrieval logic configured to retrieve from the transition table memory each transition that is indexed by data corresponding to the at least one input symbol of the received input symbol group without consideration of the current state, and (3) state selection logic configured to receive each retrieved transition, determine which of the retrieved transitions corresponds to the current state, determine a next state for the DFA based on the next state identifier of the determined transition, and determine whether a match exists between the input string and the pattern.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is an improved architecture for regular expression pattern matching. Improvements to pattern matching deterministic finite automatons (DFAs) that are described by the inventors include a pipelining strategy that pushes state-dependent feedback to a final pipeline stage to thereby enhance parallelism and throughput, augmented state transitions that track whether a transition is indicative of a pattern match occurring thereby reducing the number of necessary states for the DFA, augmented state transition that track whether a transition is indicative of a restart to the matching process, compression of the DFA'"'"'s transition table, alphabet encoding for input symbols to equivalence class identifiers, the use of an indirection table to allow for optimized transition table memory, and enhanced scalability to facilitate the ability of the improved DFA to process multiple input symbols per cycle.
-
Citations
67 Claims
-
1. A device for matching an input string to a pattern via a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state, the input string comprising a plurality of input symbols, the device comprising:
a pattern matching circuit that implements the DFA, the pattern matching circuit being configured to receive and serially process the input symbols of the input string in groups of m input symbols, wherein m is an integer that is greater than or equal to 1, the pattern matching circuit comprising;
(1) a transition table memory comprising a plurality of stored transitions, each stored transition being indexed by data corresponding to at least one input symbol, each transition comprising a next state identifier, (2) transition retrieval logic configured to retrieve from the transition table memory each transition that is indexed by data corresponding to the at least one input symbol of the received input symbol group without consideration of the current state, and (3) state selection logic configured to receive each retrieved transition, determine which of the retrieved transitions corresponds to the current state, determine a next state for the DFA based on the next state identifier of the determined transition, and determine whether a match exists between the input string and the pattern.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
25. In a device for matching an input string to a pattern via a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state and a plurality of possible next states, the input string comprising a plurality of input symbols, the device comprising at least two parallel pipeline stages, the improvement comprising:
-
a first one of the pipeline stages being configured to retrieve a plurality of transitions to a possible next state of the DFA from a pre-populated memory; and
a second one of the pipeline stages that is configured to choose, based at least in part upon the DFA'"'"'s current state, one of said retrieved transitions from which the integrated circuit will determine the next state of the DFA, wherein the second one of the pipeline stages is downstream from the first one of the pipeline stages. - View Dependent Claims (26, 27)
-
-
28. A method for matching an input string to a pattern in a device that realizes a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state and a plurality of possible next states, the input string comprising a plurality of input symbols, the integrated circuit comprising at least two parallel pipeline stages, the method comprising:
-
within one of the pipeline states, retrieving from a memory a plurality of stored transitions to a next possible state of the DFA; and
within another of the pipeline states that is downstream in the pipeline from the pipeline state from which the plurality of transitions were retrieved, selecting one of the retrieved transitions to identify the next state of the DFA, wherein the selecting is based at least in part upon the current state of the DFA. - View Dependent Claims (29)
-
-
30. A device comprising:
a pipelined architecture configured to receive an input data stream comprising a plurality of input symbols and process the received data stream through a multi-stage pattern matching pipeline, wherein the pipeline realizes a pattern matching deterministic finite automaton (DFA) that is configured to determine whether any string of input symbols within the data stream matches a pattern, the DFA having a plurality of states including a current state, the pipeline comprising a plurality of stages including a final stage, each stage having at least one stage input and at least one stage output, the final stage being configured to provide as outputs a next state for the DFA, wherein the only stage of the pipeline that receives a feedback loop based on a stage output is the final stage. - View Dependent Claims (31, 32, 33, 34, 35)
-
36. A device comprising:
a pipelined architecture configured to receive an input data stream comprising a plurality of input symbols and process the received data stream through a logic circuit to determine whether any input symbol string within the input data stream matches a pattern, the logic circuit being configured to realize a deterministic finite automaton (DFA), the DFA comprising a plurality of states, an alphabet of input symbols, an alphabet of equivalence class identifiers (ECIs), a set of functions that map groups of m input symbols to corresponding ECIs, and a set of functions that define transitions between states of the DFA, each state transition function being indexed by an ECI and comprising a next state identifier and a match flag, wherein m is an integer that is greater than or equal to one. - View Dependent Claims (37, 38, 39, 40)
-
41. A device for matching an input string to a pattern via a deterministic finite automaton (DFA), the DFA comprising a plurality of states including a current state, the input string comprising a plurality input symbols, the device comprising:
-
a memory that stores a transition table in a plurality of addressable memory words, each memory word storing at least one transition, each transition corresponding to at least one input symbol, each transition comprising a next state identifier and a match flag, the match flag being indicative of whether the transition indicates that a match has occurred between the input string and the pattern;
an input for receiving a group of m input symbols, wherein m is an integer that is greater than or equal to 1;
a logic circuit in communication with the input and the memory, wherein the logic circuit is configured to (1) resolve each received input symbol group to at least one memory word that stores at least one transition corresponding to the received input symbol group, (2) retrieve from memory the at least one memory word to which the received input symbol group was resolved, (3) determine a next state for the DFA based on the at least one retrieved memory word, and (4) determine whether the received input symbol group caused a match to the pattern based on a full match flag within the at least one retrieved memory word. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50)
-
-
51. A method of matching an input string to a pattern using a state machine, the input string comprising a plurality of input symbols, each input symbol comprising a plurality of bits, the method comprising:
-
defining a plurality of states for the state machine, the state machine having a current state and being configured to progress through the plurality of states as input symbols of the input string are received and processed to thereby detect whether a plurality of the input symbols of the input string match a pattern;
defining a plurality of transitions between the states of the state machine, each transition being indexed by data corresponding to a current state of the state machine and by data corresponding to at least one input symbol, each transition comprising an identifier for a next state of the state machine and a match flag to indicate whether a match exists between a plurality of the input symbols and the pattern;
receiving at least one input symbol of the input string;
retrieving at least one defined transition based on the at least one received input symbol;
determining which of the at least one retrieved transition is indexed by the current state;
updating the current state of the state machine as the state identified by the next state identifier of the determined retrieved transition;
determining whether a match exists between a plurality of the input symbols and the pattern based on the match flag of the retrieved transition; and
repeating the retrieving, transition determining, updating and match determining steps as additional input symbols of the input string are received. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
-
-
63. A method for creating a reduced state deterministic finite automaton (DFA) from a canonical DFA, each DFA being configured to match an input string to a pattern, the input string comprising a plurality of input symbols, each DFA comprising a plurality of states and a transition table, the transition table comprising a plurality of transitions that define how the DFA progresses from a current state to a next state in response to at least one new input symbol being received by the DFA, and wherein at least one of the states of the canonical DFA is an accept state, the method comprising:
-
eliminating each accept state from the canonical DFA to thereby create the reduced state DFA; and
augmenting the transition table such that each transition includes a flag indicative of whether the at least one new input symbol received by the reduced state DFA has caused a match to occur between the input string and the pattern. - View Dependent Claims (64, 65)
-
-
66. A computer readable medium for determining how a plurality of state transitions are to be ordered in a transition table for a deterministic finite automaton (DFA) having a plurality of states, each transition being indexed by a DFA state, the computer readable medium comprising:
-
a code segment executable by a processor for reading the transition table to determine the plurality of transitions therein; and
a code segment executable by a processor that is configured to perform a state re-ordering algorithm on the determined transitions to optimize the order of the states in the transition table.
-
-
67. A computer readable medium for determining how a plurality of run-length coded state transitions are to be stored in a transition table memory, the transition table memory comprising a plurality of words, each word having a width sufficient to store a plurality of run-length coded state transitions, the computer readable medium comprising:
-
a code segment executable by a processor for reading the run-length coded state transitions; and
a code segment executable by a processor that is configured to perform an optimal memory word packing algorithm on the read run-length coded state transitions to determine how the run-length coded state transitions are to be stored in the words of the transition table memory.
-
Specification