System and method for performing regular expression matching with high parallelism
First Claim
1. A method for operating a pattern matching engine having a plurality of information storage entries with two or more regular expressions, each regular expression including a plurality of characters and each regular expression associated with a corresponding action to be applied when matching strings are found, the method comprising the steps of:
- identifying one or more borders within each regular expression, the one or more borders separating the regular expression into a plurality of sub-expressions, each sub-expression having a plurality of sequential characters;
loading each entry of the plurality of entries of the pattern matching engine with the plurality of the sequential characters from one of the sub-expressions of the plurality of sub-expressions, wherein the borders are defined by a predetermined sequence of regular expression metacharacters, and the entries stored in content addressable memory (CAM);
applying a string from a network message to the entries of the pattern matching engine to search the string simultaneously in parallel for sub-expressions from each of the two or more regular expressions;
determining that the sub-expressions of at least one regular expression match the string; and
executing the corresponding action associated with that at least one regular expression on the network message.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for searching data strings, such as network messages, for one or more predefined regular expressions is provided. Regular expressions are programmed into a pattern matching engine so that multiple characters, e.g., 32, of the data strings can be searched at the same time. The pattern matching engine includes a regular expression storage device having one or more content-addressable memories (CAMs) whose rows may be divided into sections. Each predefined regular expression is analyzed so as to identify the “borders” within the regular expression. A border is preferably defined to exist at each occurrence of one or more predefined metacharacters, such as “.*”, which finds any character zero, one or more times. The borders separate the regular expression into a sequence of sub-expressions each of which may be one or more characters in length. Each sub-expression is preferably programmed into a corresponding section of the pattern matching engine. The system may also be configured so as to search multiple regular expressions in parallel.
-
Citations
35 Claims
-
1. A method for operating a pattern matching engine having a plurality of information storage entries with two or more regular expressions, each regular expression including a plurality of characters and each regular expression associated with a corresponding action to be applied when matching strings are found, the method comprising the steps of:
-
identifying one or more borders within each regular expression, the one or more borders separating the regular expression into a plurality of sub-expressions, each sub-expression having a plurality of sequential characters; loading each entry of the plurality of entries of the pattern matching engine with the plurality of the sequential characters from one of the sub-expressions of the plurality of sub-expressions, wherein the borders are defined by a predetermined sequence of regular expression metacharacters, and the entries stored in content addressable memory (CAM); applying a string from a network message to the entries of the pattern matching engine to search the string simultaneously in parallel for sub-expressions from each of the two or more regular expressions; determining that the sub-expressions of at least one regular expression match the string; and executing the corresponding action associated with that at least one regular expression on the network message. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus comprising:
-
means for pattern matching having a plurality of information storage entries with two or more regular expressions, each regular expression including a plurality of characters and each regular expression associated with a corresponding action to be applied when matching strings are found; means for identifying one or more borders within each regular expression, the one or more borders separating the regular expression into a plurality of sub-expressions, each sub-expression having a plurality of sequential characters; means for loading each of the plurality of entries of the means for pattern matching with the plurality of the sequential characters from one of the sub-expressions of the plurality of sub-expressions, the entries stored in content addressable memory (CAM); means for applying a string from a network message to the entries of the pattern matching engine to search the string simultaneously in parallel for sub-expressions from each of the two or more regular expressions; means for determining that the sub-expressions of at least one regular expression match the string; and means for executing the corresponding action associated with that at least one regular expression on the network message. - View Dependent Claims (13, 14, 15, 16)
-
-
17. An apparatus comprising:
-
a pattern matching engine having a plurality of information storage entries configured to store two or more regular expressions, each regular expression including a plurality of characters and associated with a corresponding action to be applied to a network message, the pattern matching engine configured to identify one or more borders within a regular expression, the one or more borders separating the regular expression into a plurality of sub-expressions, each sub-expression having a plurality of sequential characters, each entry of the plurality of entries of the pattern matching engine including the plurality of sequential characters from one of the sub-expressions of the plurality of subexpressions, the pattern matching engine configured to apply a string from a network message to the entries to search the string simultaneously in parallel for sub-expressions from each of the two or more regular expression and to determine that the plurality of sequential characters from the sub-expressions of at least one regular expression match the string, and then to execute the corresponding action associated with that at least one regular expression on the network message; and a content addressable memory (CAM), the CAM configured to store the plurality of sequential characters from the plurality of sub-expressions. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A method for operating a pattern matching engine comprising the steps of:
-
obtaining two or more regular expressions; identifying one or more borders within each regular expression using a deterministic finite state machine, the one or more borders separating the regular expression into a plurality of sub expressions, each sub expression including one or more characters, each border indicated by one or more predetermined metacharacters; loading separate portions of a memory of the pattern matching engine with the sequential characters from each of the plurality of sub expressions of the two or more regular expressions; applying a string from a network message to the memory to search the string in parallel for sub expressions from each of the two or more regular expression; and determining that at least one of the two or more regular expressions matches the string and executing in response thereto a corresponding action associated with that regular expression; wherein the memory is a content addressable memory (CAM). - View Dependent Claims (23, 24, 25, 26, 27, 28)
-
-
29. An apparatus comprising:
-
a pattern matching engine configured to obtain two or more regular expressions and to identify one or more borders within each regular expression using a deterministic finite state machine, the one or more borders separating the regular expression into a plurality of sub expressions, each sub expression including one or more characters, each border indicated by one or more predetermined metacharacters; a memory of the pattern matching engine configured to store the sequential characters from each of the plurality of sub expressions of the two or more regular expressions in separate portions of memory, the memory further configured to, in response to application of a string from a network message, search the string in parallel for sub expressions from each of the two or more regular expression, and to determine that at least one of the two or more regular expressions matches the string; and the pattern matching engine further configured to cause the execution of a corresponding action associated with that regular expression that matches; wherein the memory is a content addressable memory (CAM). - View Dependent Claims (30, 31, 32, 33, 34, 35)
-
Specification