System and method for determining the start of a match of a regular expression
First Claim
1. A method for determining the start states of each rule of a plurality of rules, each rule being utilized for pattern recognition of an input character string, and generating a multi-rule deterministic finite state automaton (DFA), which comprises the steps of:
- prepending to each rule the metacharacters “
.*” and
transforming each rule prepended with the metacharacters to a non-deterministic finite state automaton (NFA) using a Thompson Construction;
analyzing the NFA for each rule to determine NFA start states by the following substeps;
a) producing an epsilon closure starting at the initial state of the NFA;
b) producing a 1-closure of the initial epsilon closure;
c) comparing the states in the initial epsilon closure with the states in the 1 closure; and
d) determining as NFA start states all states in the 1-closure which are not in the initial epsilon closure;
converting the NFA for each rule into a DFA using an NFA to DFA algorithm, thereby creating a DFA state from one or more NFA states;
for each DFA state that contains an NFA start state, determining the distance of the DFA state from the global start of the DFA for each rule;
comparing the distances of the DFA start states that contain an NFA start state from the global start state;
choosing as a DFA start state the DFA state containing an NFA start state which is closest to the global start state;
if more than one DFA state containing an NFA start state have the same closest distance to the global start state, accepting as DFA start states each of said closest DFA states having the same closest distance to the global start state;
creating a new NFA start state and inserting an epsilon transition from the new NFA start state to each of the DFA'"'"'s for each rule, thereby creating a meta-NFA;
converting the meta-NFA to a final multi-rule DFA storing said multi-rule DFA in a memory; and
applying said final multi-rule DFA to conduct pattern matching on said input character string.
9 Assignments
0 Petitions
Accused Products
Abstract
A method for determining the start of a match of a regular expression using the special state table, the set of start state registers and the DFA next state table, includes the step of determining from the regular expression each start-of-match start state and each end-of-match terminal state. For each start state, a start state entry is loaded into the special state table. For each terminal state, a terminal state entry is loaded into each special state table. The next state table is used to return the next state from the current state and an input character. When a start state is encountered, the current offset from the beginning of the input character string is loaded into the start state register. When a terminal state is encountered, the terminal state entry is retrieved from the special state table, and the value of the start state register corresponding to the rule number of the terminal entry in the special state table is further retrieved. The value of the start state register which is retrieved indicates the location in the character string where the start-of-match occurred for a particular rule.
161 Citations
1 Claim
-
1. A method for determining the start states of each rule of a plurality of rules, each rule being utilized for pattern recognition of an input character string, and generating a multi-rule deterministic finite state automaton (DFA), which comprises the steps of:
-
prepending to each rule the metacharacters “
.*” and
transforming each rule prepended with the metacharacters to a non-deterministic finite state automaton (NFA) using a Thompson Construction;analyzing the NFA for each rule to determine NFA start states by the following substeps; a) producing an epsilon closure starting at the initial state of the NFA; b) producing a 1-closure of the initial epsilon closure; c) comparing the states in the initial epsilon closure with the states in the 1 closure; and d) determining as NFA start states all states in the 1-closure which are not in the initial epsilon closure; converting the NFA for each rule into a DFA using an NFA to DFA algorithm, thereby creating a DFA state from one or more NFA states; for each DFA state that contains an NFA start state, determining the distance of the DFA state from the global start of the DFA for each rule; comparing the distances of the DFA start states that contain an NFA start state from the global start state;
choosing as a DFA start state the DFA state containing an NFA start state which is closest to the global start state;if more than one DFA state containing an NFA start state have the same closest distance to the global start state, accepting as DFA start states each of said closest DFA states having the same closest distance to the global start state; creating a new NFA start state and inserting an epsilon transition from the new NFA start state to each of the DFA'"'"'s for each rule, thereby creating a meta-NFA; converting the meta-NFA to a final multi-rule DFA storing said multi-rule DFA in a memory; and applying said final multi-rule DFA to conduct pattern matching on said input character string.
-
Specification