Method and apparatus for compilation of finite automata
First Claim
1. A security appliance operatively coupled to a network, the security appliance comprising:
- at least one memory and at least one network interface;
at least one processor operatively coupled to the at least one memory and the at least one network interface, the at least one processor configured to;
select a subpattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic;
generate a unified deterministic finite automata (DFA) using the subpatterns selected from all patterns in the set;
generate at least one non-deterministic finite automata (NFA) for at least one pattern in the set, a portion of the at least one pattern used for generating the at least one NFA, and at least one walk direction selected from a reverse and forward walk direction for run time processing of the at least one NFA, being determined based on whether a length of the subpattern selected from the at least one pattern is fixed or variable and a location of the subpattern selected within the at least one pattern; and
store the unified DFA and the at least one NFA generated in the at least one memory for run time processing by the at least one processor with a payload received via the at least one network interface, to determine pattern matches in the payload prior to forwarding the payload, the subpatterns selected based on the at least one heuristic to minimize a number of false positives identified in the at least one NFA to reduce the run time processing of the at least one processor.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and corresponding apparatus are provided implementing run time processing using Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload. A subpattern may be selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic and a unified deterministic finite automata (DFA) may be generated using the subpatterns selected from all patterns in the set, and at least one non-deterministic finite automata (NFA) may be generated for at least one pattern in the set, optimizing run time performance of the run time processing.
-
Citations
73 Claims
-
1. A security appliance operatively coupled to a network, the security appliance comprising:
-
at least one memory and at least one network interface; at least one processor operatively coupled to the at least one memory and the at least one network interface, the at least one processor configured to; select a subpattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic; generate a unified deterministic finite automata (DFA) using the subpatterns selected from all patterns in the set; generate at least one non-deterministic finite automata (NFA) for at least one pattern in the set, a portion of the at least one pattern used for generating the at least one NFA, and at least one walk direction selected from a reverse and forward walk direction for run time processing of the at least one NFA, being determined based on whether a length of the subpattern selected from the at least one pattern is fixed or variable and a location of the subpattern selected within the at least one pattern; and store the unified DFA and the at least one NFA generated in the at least one memory for run time processing by the at least one processor with a payload received via the at least one network interface, to determine pattern matches in the payload prior to forwarding the payload, the subpatterns selected based on the at least one heuristic to minimize a number of false positives identified in the at least one NFA to reduce the run time processing of the at least one processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
2. The security appliance of claim 1, wherein the at least one heuristic includes maximizing a number of unique subpatterns selected and length of each subpattern selected.
-
3. The security appliance of claim 1, wherein the processor is further configured to determine whether the length of the subpattern selected from the at least one pattern is fixed or variable.
-
4. The security appliance of claim 1, wherein the at least one processor is further configured to:
-
compute a hash value of each subpattern selected; and store each hash value computed in association with an identifier of a pattern from which the subpattern was selected.
-
-
5. The security appliance of claim 1, wherein the at least one heuristic includes maximizing a number of unique subpatterns selected, and to maximize the number of unique subpatterns selected, the at least one processor is further configured to, for each pattern in the set:
-
compute a hash value of the subpattern selected; compare the hash value computed to a list of hash values of subpatterns selected for other patterns in the set; and if the hash value computed is found in the list, determine whether to replace (i) the subpattern selected with another subpattern from the pattern or (ii) the subpattern selected for another pattern in the set with an alternate subpattern selected from the other pattern in the set, the other pattern in the set being identified based on an association with the hash value computed in the list, wherein the determination for whether to replace (i) or (ii) is based on comparing lengths of subpatterns being considered for the replacement in order to maximize lengths of the unique subpatterns being selected.
-
-
6. The security appliance of claim 1, wherein the at least one heuristic includes identifying subpatterns of each pattern and disregarding a given subpattern of the subpatterns identified of each pattern, if the given subpattern has a length less than a minimum threshold.
-
7. The security appliance of claim 1, wherein the at least one heuristic includes accessing a knowledge base of subpatterns associated with historical frequency of use indicators and disregarding a given subpattern of the subpatterns identified of each pattern, if a historical frequency of use indicator for the given subpattern in the knowledge base accessed is greater than or equal to a frequency use threshold.
-
8. The security appliance of claim 1, wherein the at least one heuristic includes identifying subpatterns of each pattern, and for each pattern, maximizing a number of consecutive text characters in the subpattern selected by selecting a given subpattern of the subpatterns identified based on the given subpattern having a largest number of consecutive text characters of the subpatterns identified and based on the given subpattern being unique among all subpatterns selected for the set of one or more regular expressions.
-
9. The security appliance of claim 1, wherein the at least one heuristic includes:
-
prioritizing given subpatterns of each pattern based on a subpattern type of each of the given subpatterns and lengths of the given subpatterns, a longer length subpattern being prioritized higher than another subpattern of lesser length; selecting a unique subpattern as the subpattern selected, based on the prioritizing, the unique subpattern selected having a length of at least a minimum length threshold; and selecting a non-unique subpattern as the subpattern selected, based on the prioritizing, if none of the given subpatterns are unique and have a length of at least the minimum length threshold.
-
-
10. The security appliance of claim 1, wherein the at least one heuristic includes prioritizing given subpatterns of each pattern based on a subpattern type of each of the given subpatterns, the subpattern type being text only, alternation, single character repetition, or multi-character repetition, and a priority order from highest to lowest for the subpattern type is:
- text only, alternation, single character repetition, multi-character repetition.
-
11. The security appliance of claim 1, wherein if a first element of the subpattern selected is a first element of the at least one pattern and the length of the subpattern selected is fixed, the location of the subpattern selected is a beginning-location of the at least one pattern, the portion of the at least one pattern used for generating the at least one NFA is the at least one pattern excluding the subpattern selected, the at least one NFA is a single NFA, and the at least one walk direction of the at least one NFA is a forward walk direction.
-
12. The security appliance of claim 11, wherein to generate the unified DFA, the at least one processor is further configured to:
associate a DFA node, of the unified DFA generated, that is associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the at least one NFA generated, an instruction to transition to walk the at least one NFA in a forward walk direction, the starting node of the at least one NFA associated with a first element of the portion of the at least one pattern used for generating the at least one NFA, a payload starting offset of the at least one NFA being associated with a byte subsequent to another byte at the end offset of the subpattern selected, and to report a match of the subpattern selected, a lead offset within the payload, of a lead character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected.
-
13. The security appliance of claim 11, wherein to generate the at least one NFA, the at least one processor is further configured to:
associate an NFA node, of the at least one NFA generated, with metadata indicating to the walker an instruction to terminate the walk, the NFA node associated with a last element of the at least one pattern, and to report a lag offset within the payload, of a lag character matching at the NFA node, as an end offset of the at least one pattern and a final match of the at least one pattern.
-
14. The security appliance of claim 1, wherein if a first element of the subpattern selected is not a first element of the at least one pattern and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the length of the subpattern selected is fixed:
-
the portion of the at least one pattern for generating the at least one NFA includes a lag portion and a lead portion of the at least one pattern, the lag portion of the at least one pattern is the at least one pattern excluding the subpattern selected and the lead portion of the at least one pattern, the lead portion of the at least one pattern excludes the subpattern selected and the lag portion of the at least one pattern; and the at least one NFA includes a lag NFA and a lead NFA, the at least one walk direction includes a forward walk direction and a reverse walk direction, the lag NFA having the forward walk direction, the lead NFA having the reverse walk direction, the lag portion of the at least one pattern used for generating the lag NFA and the lead portion of the at least one pattern used for generating the lead NFA.
-
-
15. The security appliance of claim 14, wherein to generate the unified DFA, the at least one processor is further configured to:
associate a DFA node of the unified DFA generated, the DFA node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the lag NFA, an instruction to transition to walk the lag NFA in the forward walk direction, the starting node of the lag NFA associated with a first element of the lag portion, a payload starting offset of the lag NFA associated with a byte subsequent to another byte at the end offset of the subpattern selected, a pointer to a starting node of the lead NFA, an instruction to transition to walk the lead NFA in the reverse walk direction, the starting node of the lead NFA associated with a last element of the lead portion, a payload starting offset of the lead NFA determined by subtracting the length of subpattern selected from the end offset of the subpattern selected, and to report an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, a match of the subpattern selected, and a length of the subpattern selected.
-
16. The security appliance of claim 14, wherein to generate the at least one NFA, the at least one processor is further configured to:
-
associate a lag node of the lag NFA, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker an instruction to terminate walking the lag NFA generated, and to report a lag offset within the payload, of a lag character of the payload matching the last element at the lag node, and a match of the lag portion of the at least one pattern; and associate a lead node of the lead NFA, the lead node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate walking the lead NFA generated and to report a match of the lead portion of the at least one pattern and a lead offset within the payload, of a lead character of the payload matching the first element at the lead node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
-
17. The security appliance of claim 1, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the first element of the subpattern selected is the first element of the at least one pattern, the location of the subpattern selected is the beginning-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is fixed or variable:
-
the portion of the at least one pattern for generating the at least one NFA includes a lag portion and an entire portion of the at least one pattern, the lag portion of the at least one pattern is the at least one pattern excluding a lead portion of the at least one pattern, the lead portion including the first element of the at least one pattern, the last element of the subpattern selected, and all elements in the at least one pattern therebetween, the entire portion of the at least one pattern is the at least one pattern, the lead portion being the subpattern selected if the location of the subpattern selected is a beginning-location; and the at least one NFA includes a lag NFA and an umbrella NFA, the at least one walk direction includes a forward walk direction and a reverse walk direction, the lag NFA having the forward walk direction, the umbrella NFA having the reverse walk direction, the lag portion of the at least one pattern used for generating the lag NFA and the entire portion of the at least one pattern used for generating the umbrella NFA.
-
-
18. The security appliance of claim 17, wherein to generate the unified DFA, the at least one processor is further configured to:
associate a DFA node of the unified DFA generated, the DFA node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the lag NFA, an instruction to transition to walk the lag NFA in the forward walk direction, the starting node of the lag NFA associated with a first element of the lag portion, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
19. The security appliance of claim 17, wherein to generate the at least one NFA, the at least one processor is further configured to:
-
associate a lag node of the lag NFA generated, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker a pointer to a starting node of the umbrella NFA, an instruction to transition to walk the umbrella NFA in the reverse walk direction, the starting node of the umbrella NFA associated with the last element of the at least one pattern, and to optionally report an offset within the payload, of a character matching the last element of the at least one pattern at the lag node, and to optionally report a match of the lag portion of the at least one pattern; and associate an umbrella node of the umbrella NFA generated, the umbrella node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate the walk and to report a final match of the at least one pattern and a start offset within the payload, of a start character matching the first element of the at least one pattern at the umbrella node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
-
20. The security appliance of claim 1, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the first element of the subpattern selected is the first element of the at least one pattern, the location of the subpattern selected is a beginning-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is fixed or variable:
-
the portion of the at least one pattern for generating the at least one NFA includes a lag portion and a lead portion of the at least one pattern, the lag portion of the at least one pattern is the at least one pattern excluding the lead portion of the at least one pattern, the lead portion including the first element of the at least one pattern, the last element of the subpattern selected, and all elements in the at least one pattern therebetween, the lag portion being the subpattern selected if the location of the subpattern selected is the beginning-location; and the at least one NFA includes a lag NFA and a lead NFA, the at least one walk direction includes a forward walk direction and a reverse walk direction, the lag NFA having the forward walk direction, the lead NFA having the reverse walk direction, the lag portion of the at least one pattern used for generating the lag NFA and the lead portion of the at least one pattern used for generating the lead NFA.
-
-
21. The security appliance of claim 20, wherein to generate the unified DFA, the at least one processor is further configured to:
associate a DFA node of the unified DFA generated, the DFA node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the lag NFA, an instruction to transition to walk the lag NFA in the forward walk direction, the starting node of the lag NFA associated with a first element of the lag portion, a pointer to a starting node of the lead NFA, an instruction to transition to walk the lead NFA in the reverse walk direction, the starting node of the lead NFA associated with a last element of the subpattern selected, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
22. The security appliance of claim 20, wherein to generate the at least one NFA, the at least one processor is further configured to:
-
associate a lag node of the lag NFA generated, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker an instruction to terminate walking the lag NFA, and to report a lag offset within the payload, of a lag character matching the last element of the at least one pattern at the lag node, and to report a match of the lag portion of the at least one pattern; and associate a lead node of the lead NFA generated, the lead node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate walking the lead NFA and to report a match of the lead portion and a lead offset within the payload, of a lead character matching the first element of the at least one pattern at the lead node.
-
-
23. The security appliance of claim 1, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is fixed or variable:
the at least one NFA is a single NFA, and the at least one walk direction includes a forward walk direction, for run time processing nodes of the single NFA associated with elements of a lag portion of the at least one pattern, and a reverse walk direction, for run time processing nodes of the single NFA associated with all elements of the at least one pattern, the lag portion of the at least one pattern being the at least one pattern excluding a lead portion of the at least one pattern, the lead portion including the first element of the at least one pattern, the last element of the subpattern selected, and all elements in the at least one pattern therebetween.
-
24. The security appliance of claim 23, wherein to generate the unified DFA, the at least one processor is further configured to:
associate a DFA node of the unified DFA generated, the node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the single NFA, an instruction to transition to walk the single NFA generated in the forward walk direction, the starting node associated with a next element in the at least one pattern immediately following the last element of the subpattern selected, and to report a match of the subpattern selected, an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
25. The security appliance of claim 23, wherein to generate the at least one NFA, the at least one processor is further configured to:
-
associate a lag node of the single NFA, the lag node associated with a last element of the at least one pattern, with metadata indicating to the walker an instruction to transition to walk the single NFA in the reverse walk direction with a payload starting offset associated with the end offset of the subpattern selected; and associate a lead node of the single NFA, the lead node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate the walk, and to report an offset within the payload, of a character matching the first element of the at least one pattern at the lead node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern, and a final match of the at least one pattern.
-
-
26. The security appliance of claim 1, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the length of the subpattern selected is fixed:
the at least one NFA is a single NFA, and the at least one walk direction includes a reverse walk direction, for run time processing nodes of the single NFA associated with a lead portion of the at least one pattern, and a forward walk direction, for run time processing nodes of the single NFA associated with all elements of the at least one pattern, the lead portion being the at least one pattern excluding a lag portion of the at least one pattern, the lag portion including the first element of the subpattern selected, the last element of the at least one pattern, and all elements in the at least one pattern therebetween.
-
27. The security appliance of claim 26, wherein to generate the unified DFA, the at least one processor is further configured to:
associate a DFA node of the unified DFA generated, the node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the single NFA, an instruction to transition to walk the single NFA generated in the reverse walk direction, the starting node associated with a last element of the lead portion, a payload starting offset being determined by subtracting the length of subpattern selected from the end offset of the subpattern selected, and to report a match of the subpattern selected, an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected.
-
28. The security appliance of claim 26, wherein to generate the at least one NFA, the at least one processor is further configured to:
-
associate a lead node of the single NFA, the lead node associated with a first element of the at least one pattern, with metadata indicating to the walker an instruction to transition to walk the single NFA in the forward walk direction; and associate a lag node of the single NFA, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate the walk, and to report an offset within the payload, of a character matching the last element of the at least one pattern at the lag node, and a final match of the at least one pattern.
-
-
29. The security appliance of claim 1, wherein if a last element of the subpattern selected is a last element of the at least one pattern, the location of the subpattern selected is an end-location of the at least one pattern, and if the length of the subpattern selected is fixed, the portion of the at least one pattern for generating the at least one NFA is the at least one pattern excluding the subpattern selected, and the at least one walk direction is a reverse walk direction.
-
30. The security appliance of claim 29, wherein to generate the unified DFA the processor is further configured to:
associate a DFA node, the DFA node corresponding to the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the at least one NFA generated, an instruction to transition to walk the at least one NFA generated in a reverse walk direction, the starting node of the at least one NFA generated associated with a last element of the portion, a payload starting offset of the at least one NFA generated being determined by subtracting the length of subpattern selected from the end offset of the subpattern selected, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected.
-
31. The security appliance of claim 29, wherein to generate the at least one NFA the processor is further configured to:
associate an NFA node associated with a first element of the portion, with metadata indicating to the walker to terminate the walk and to report a final match of the at least one pattern and an offset within the payload, of a character matching the first element of the portion at the NFA node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
32. The security appliance of claim 1, wherein if a last element of the subpattern selected is a last element of the at least one pattern, the location of the subpattern selected is an end-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is variable or fixed, the portion of the at least one pattern for generating the at least one NFA is the at least one pattern, and the at least one walk direction is a reverse walk direction.
-
33. The security appliance of claim 32, wherein to generate the unified DFA the processor is further configured to:
associate a DFA node, the DFA node corresponding to the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the at least one NFA generated, an instruction to transition to walk the at least one NFA generated in a reverse walk direction, the starting node of the at least one NFA generated associated with a last element of the subpattern selected, a payload starting offset of the at least one NFA generated associated with the end offset of the subpattern selected, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
34. The security appliance of claim 32, wherein to generate the at least one NFA the processor is further configured to:
associate an NFA node associated with a first element of the portion, with metadata indicating to the walker to terminate the walk and to report a final match of the at least one pattern and an offset within the payload, of a character matching the first element of the portion at the NFA node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
35. The security appliance of claim 1, wherein to store the unified DFA and the at least one NFA generated in the at least one memory the processor is further configured to generate a binary image including the unified DFA and the at least one NFA generated.
-
36. The security appliance of claim 1, wherein the at least one processor includes a DFA co-processor and an NFA co-processor configured as an acceleration unit to offload DFA and NFA run time processing, respectively.
-
2. The security appliance of claim 1, wherein the at least one heuristic includes maximizing a number of unique subpatterns selected and length of each subpattern selected.
-
-
37. A method comprising:
in at least one processor operatively coupled to at least one memory and at least one network interface in a security appliance operatively coupled to a network; selecting a subpattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic; generating a unified deterministic finite automata (DFA) using the subpatterns selected from all patterns in the set; generating at least one non-deterministic finite automata (NFA) for at least one pattern in the set, a portion of the at least one pattern used for generating the at least one NFA, and at least one walk direction selected from a reverse and forward walk direction for run time processing of the at least one NFA, being determined based on whether a length of the subpattern selected from the at least one pattern is fixed or variable and a location of the subpattern selected within the at least one pattern; and storing the unified DFA and the at least one NFA generated in the at least one memory for run time processing by the at least one processor with a payload received via the at least one network interface, to determine pattern matches in the payload prior to forwarding the payload, the subpatterns selected based on the at least one heuristic to minimize a number of false positives identified in the at least one NFA to reduce the run time processing of the at least one processor. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
-
38. The method of claim 37, wherein the at least one heuristic includes maximizing a number of unique subpatterns selected and length of each subpattern selected.
-
39. The method of claim 37, further including determining whether the length of the subpattern selected from the at least one pattern is fixed or variable.
-
40. The method of claim 37, further including:
-
computing a hash value of each subpattern selected; and storing each hash value computed in association with an identifier of a pattern from which the subpattern was selected.
-
-
41. The method of claim 37, wherein the at least one heuristic includes maximizing a number of unique subpatterns selected, and further including, for each pattern in the set:
-
computing a hash value of the subpattern selected; comparing the hash value computed to a list of hash values of subpatterns selected for other patterns in the set; and if the hash value computed is found in the list, determining whether to replace (i) the subpattern selected with another subpattern from the pattern or (ii) the subpattern selected for another pattern in the set with an alternate subpattern selected from the other pattern in the set, the other pattern in the set being identified based on an association with the hash value computed in the list, wherein the determination for whether to replace (i) or (ii) is based on comparing lengths of subpatterns being considered for the replacement in order to maximize lengths of the unique subpatterns being selected.
-
-
42. The method of claim 37, wherein the at least one heuristic includes identifying subpatterns of each pattern and disregarding a given subpattern of the subpatterns identified of each pattern, if the given subpattern has a length less than a minimum threshold.
-
43. The method of claim 37, wherein the at least one heuristic includes accessing a knowledge base of subpatterns associated with historical frequency of use indicators and disregarding a given subpattern of the subpatterns identified of each pattern, if a historical frequency of use indicator for the given subpattern in the knowledge base accessed is greater than or equal to a frequency use threshold.
-
44. The method of claim 37, wherein the at least one heuristic includes identifying subpatterns of each pattern, and for each pattern, maximizing a number of consecutive text characters in the subpattern selected by selecting a given subpattern of the subpatterns identified based on the given subpattern having a largest number of consecutive text characters of the subpatterns identified and based on the given subpattern being unique among all subpatterns selected for the set of one or more regular expressions.
-
45. The method of claim 37, wherein the at least one heuristic includes:
-
prioritizing given subpatterns of each pattern based on a subpattern type of each of the given subpatterns and lengths of the given subpatterns, a longer length subpattern being prioritized higher than another subpattern of lesser length; selecting a unique subpattern as the subpattern selected, based on the prioritizing, the unique subpattern selected having a length of at least a minimum length threshold; and selecting a non-unique subpattern as the subpattern selected, based on the prioritizing, if none of the given subpatterns are unique and have a length of at least the minimum length threshold.
-
-
46. The method of claim 37, wherein the at least one heuristic includes prioritizing given subpatterns of each pattern based on a subpattern type of each of the given subpatterns, the subpattern type being text only, alternation, single character repetition, or multi-character repetition, and a priority order from highest to lowest for the subpattern type is:
- text only, alternation, single character repetition, multi-character repetition.
-
47. The method of claim 37, wherein if a first element of the subpattern selected is a first element of the at least one pattern and the length of the subpattern selected is fixed, the location of the subpattern selected is a beginning-location of the at least one pattern, the portion of the at least one pattern used for generating the at least one NFA is the at least one pattern excluding the subpattern selected, the at least one NFA is a single NFA, and the at least one walk direction of the at least one NFA is a forward walk direction.
-
48. The method of claim 47, wherein generating the unified DFA includes:
associating a DFA node, of the unified DFA generated, that is associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the at least one NFA generated, an instruction to transition to walk the at least one NFA in a forward walk direction, the starting node of the at least one NFA associated with a first element of the portion of the at least one pattern used for generating the at least one NFA, a payload starting offset of the at least one NFA associated with a byte subsequent to another byte at the end offset of the subpattern selected, and to report a match of the subpattern selected, a lead offset within the payload, of a lead character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected.
-
49. The method of claim 47, wherein generating the at least one NFA includes:
associating an NFA node, of the at least one NFA generated, with metadata indicating to the walker an instruction to terminate the walk, the NFA node associated with a last element of the at least one pattern, and to report a lag offset within the payload, of a lag character matching at the NFA node, as an end offset of the at least one pattern and a final match of the at least one pattern.
-
50. The method of claim 37, wherein if a first element of the subpattern selected is not a first element of the at least one pattern and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the length of the subpattern selected is fixed:
-
the portion of the at least one pattern for generating the at least one NFA includes a lag portion and a lead portion of the at least one pattern, the lag portion of the at least one pattern is the at least one pattern excluding the subpattern selected and the lead portion of the at least one pattern, the lead portion of the at least one pattern excludes the subpattern selected and the lag portion of the at least one pattern; and the at least one NFA includes a lag NFA and a lead NFA, the at least one walk direction includes a forward walk direction and a reverse walk direction, the lag NFA having the forward walk direction, the lead NFA having the reverse walk direction, the lag portion of the at least one pattern used for generating the lag NFA and the lead portion of the at least one pattern used for generating the lead NFA.
-
-
51. The method of claim 50, generate the unified DFA includes:
associating a DFA node of the unified DFA generated, the DFA node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the lag NFA, an instruction to transition to walk the lag NFA in the forward walk direction, the starting node of the lag NFA associated with a first element of the lag portion, a payload starting offset of the lag NFA associated with byte subsequent to the end offset of the subpattern selected, a pointer to a starting node of the lead NFA, an instruction to transition to walk the lead NFA in the reverse walk direction, the starting node of the lead NFA associated with a last element of the lead portion, a payload starting offset of the lead NFA determined by subtracting the length of subpattern selected from the end offset of the subpattern selected, and to report an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, a match of the subpattern selected, and a length of the subpattern selected.
-
52. The method of claim 50, wherein generating the at least one NFA includes:
-
associating a lag node of the lag NFA, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker an instruction to terminate walking the lag NFA generated, and to report a lag offset within the payload, of a lag character of the payload matching the last element at the lag node, and a match of the lag portion of the at least one pattern; and associating a lead node of the lead NFA, the lead node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate walking the lead NFA generated and to report a match of the lead portion of the at least one pattern and a lead offset within the payload, of a lead character of the payload matching the first element at the lead node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
-
53. The method of claim 37, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the first element of the subpattern selected is the first element of the at least one pattern, the location of the subpattern selected is the beginning-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is fixed or variable:
-
the portion of the at least one pattern for generating the at least one NFA includes a lag portion and an entire portion of the at least one pattern, the lag portion of the at least one pattern is the at least one pattern excluding a lead portion of the at least one pattern, the lead portion including the first element of the at least one pattern, the last element of the subpattern selected, and all elements in the at least one pattern therebetween, the entire portion of the at least one pattern is the at least one pattern, the lead portion being the subpattern selected if the location of the subpattern selected is a beginning-location; and the at least one NFA includes a lag NFA and an umbrella NFA, the at least one walk direction includes a forward walk direction and a reverse walk direction, the lag NFA having the forward walk direction, the umbrella NFA having the reverse walk direction, the lag portion of the at least one pattern used for generating the lag NFA and the entire portion of the at least one pattern used for generating the umbrella NFA.
-
-
54. The method of claim 53, wherein generating the unified DFA includes:
associating a DFA node of the unified DFA generated, the DFA node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the lag NFA, an instruction to transition to walk the lag NFA in the forward walk direction, the starting node of the lag NFA associated with a first element of the lag portion, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
55. The method of claim 53, wherein generating the at least one NFA includes:
-
associating a lag node of the lag NFA generated, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker a pointer to a starting node of the umbrella NFA, an instruction to transition to walk the umbrella NFA in the reverse walk direction, the starting node of the umbrella NFA associated with the last element of the at least one pattern, and to optionally report an offset within the payload, of a character matching the last element of the at least one pattern at the lag node, and to optionally report a match of the lag portion of the at least one pattern; and associating an umbrella node of the umbrella NFA generated, the umbrella node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate the walk and to report a final match of the at least one pattern and a start offset within the payload, of a start character matching the first element of the at least one pattern at the umbrella node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
-
56. The method of claim 37, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the first element of the subpattern selected is the first element of the at least one pattern, the location of the subpattern selected is a beginning-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is fixed or variable:
-
the portion of the at least one pattern for generating the at least one NFA includes a lag portion and a lead portion of the at least one pattern, the lag portion of the at least one pattern is the at least one pattern excluding the lead portion of the at least one pattern, the lead portion including the first element of the at least one pattern, the last element of the subpattern selected, and all elements in the at least one pattern therebetween, the lag portion being the subpattern selected if the location of the subpattern selected is the beginning-location; and the at least one NFA includes a lag NFA and a lead NFA, the at least one walk direction includes a forward walk direction and a reverse walk direction, the lag NFA having the forward walk direction, the lead NFA having the reverse walk direction, the lag portion of the at least one pattern used for generating the lag NFA and the lead portion of the at least one pattern used for generating the lead NFA.
-
-
57. The method of claim 56, wherein generating the unified DFA includes:
associating a DFA node of the unified DFA generated, the DFA node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the lag NFA, an instruction to transition to walk the lag NFA in the forward walk direction, the starting node of the lag NFA associated with a first element of the lag portion, a pointer to a starting node of the lead NFA, an instruction to transition to walk the lead NFA in the reverse walk direction, the starting node of the lead NFA associated with a last element of the subpattern selected, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
58. The method of claim 56, wherein to generating the at least one NFA includes:
-
associating a lag node of the lag NFA generated, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker an instruction to terminate walking the lag NFA, and to report a lag offset within the payload, of a lag character matching the last element of the at least one pattern at the lag node, and to report a match of the lag portion of the at least one pattern; and associating a lead node of the lead NFA generated, the lead node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate walking the lead NFA and to report a match of the lead portion and a lead offset within the payload, of a lead character matching the first element of the at least one pattern at the lead node.
-
-
59. The method of claim 37, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is fixed or variable:
the at least one NFA is a single NFA, and the at least one walk direction includes a forward walk direction, for run time processing nodes of the single NFA associated with elements of a lag portion of the at least one pattern, and a reverse walk direction, for run time processing nodes of the single NFA associated with all elements of the at least one pattern, the lag portion of the at least one pattern being the at least one pattern excluding a lead portion of the at least one pattern, the lead portion including the first element of the at least one pattern, the last element of the subpattern selected, and all elements in the at least one pattern therebetween.
-
60. The method of claim 59, wherein generating the unified DFA includes:
associating a DFA node of the unified DFA generated, the node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the single NFA, an instruction to transition to walk the single NFA generated in the forward walk direction, the starting node associated with a next element in the at least one pattern immediately following the last element of the subpattern selected, and to report a match of the subpattern selected, an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
61. The method of claim 59, wherein generating the at least one NFA includes:
-
associating a lag node of the single NFA, the lag node associated with a last element of the at least one pattern, with metadata indicating to the walker an instruction to transition to walk the single NFA in the reverse walk direction with a payload starting offset associated with the end offset of the subpattern selected; and associating a lead node of the single NFA, the lead node associated with the first element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate the walk, and to report an offset within the payload, of a character matching the first element of the at least one pattern at the lead node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern, and a final match of the at least one pattern.
-
-
62. The method of claim 37, wherein if a first element of the subpattern selected is not a first element of the at least one pattern, and a last element of the subpattern selected is not a last element of the at least one pattern, the location of the subpattern selected is a mid-location of the at least one pattern, and if the length of the subpattern selected is fixed:
the at least one NFA is a single NFA, and the at least one walk direction includes a reverse walk direction, for run time processing nodes of the single NFA associated with a lead portion of the at least one pattern, and a forward walk direction, for run time processing nodes of the single NFA associated with all elements of the at least one pattern, the lead portion being the at least one pattern excluding a lag portion of the at least one pattern, the lag portion including the first element of the subpattern selected, the last element of the at least one pattern, and all elements in the at least one pattern therebetween.
-
63. The method of claim 62, wherein generating the unified DFA includes:
associating a DFA node of the unified DFA generated, the node associated with the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the single NFA, an instruction to transition to walk the single NFA generated in the reverse walk direction, the starting node associated with a last element of the lead portion, a payload starting offset being determined by subtracting the length of subpattern selected from the end offset of the subpattern selected, and to report a match of the subpattern selected, an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected.
-
64. The method of claim 62, wherein to generating the at least one NFA includes:
-
associating a lead node of the single NFA, the lead node associated with a first element of the at least one pattern, with metadata indicating to the walker an instruction to transition to walk the single NFA in the forward walk direction; and associating a lag node of the single NFA, the lag node associated with the last element of the at least one pattern, with metadata indicating to the walker, an instruction to terminate the walk, and to report an offset within the payload, of a character matching the last element of the at least one pattern at the lag node, and a final match of the at least one pattern.
-
-
65. The method of claim 37, wherein if a last element of the subpattern selected is a last element of the at least one pattern, the location of the subpattern selected is an end-location of the at least one pattern, and if the length of the subpattern selected is fixed, the portion of the at least one pattern for generating the at least one NFA is the at least one pattern excluding the subpattern selected, and the at least one walk direction is a reverse walk direction.
-
66. The method of claim 65, wherein generating the unified DFA includes:
associating a DFA node, the DFA node corresponding to the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the at least one NFA generated, an instruction to transition to walk the at least one NFA generated in a reverse walk direction, the starting node of the at least one NFA generated associated with a last element of the portion, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected.
-
67. The method of claim 65, wherein generating the at least one NFA includes:
associating an NFA node associated with a first element of the portion, with metadata indicating to the walker to terminate the walk and to report a final match of the at least one pattern and an offset within the payload, of a character matching the first element of the portion at the NFA node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
68. The method of claim 37, wherein if a last element of the subpattern selected is a last element of the at least one pattern, the location of the subpattern selected is an end-location of the at least one pattern, and if the length of the subpattern selected from the at least one pattern is variable or fixed, the portion of the at least one pattern for generating the at least one NFA is the at least one pattern, and the at least one walk direction is a reverse walk direction.
-
69. The method of claim 68, wherein generating the unified DFA includes:
associating a DFA node, the DFA node corresponding to the last element of the subpattern selected, with metadata indicating to the walker a pointer to a starting node of the at least one NFA generated, an instruction to transition to walk the at least one NFA generated in a reverse walk direction, the starting node of the at least one NFA generated associated with a last element of the subpattern selected, a payload starting offset of the at least one NFA generated associated with the end offset of the subpattern selected, and to report a match of the subpattern selected and an offset within the payload, of a character matching the last element of the subpattern selected at the DFA node, as an end offset of the subpattern selected, and a length of the subpattern selected if the length is fixed.
-
70. The method of claim 68, wherein to generating the at least one NFA includes:
associating an NFA node associated with a first element of the portion, with metadata indicating to the walker to terminate the walk and to report a final match of the at least one pattern and an offset within the payload, of a character matching the first element of the portion at the NFA node, as a start offset of the at least one pattern, if required by a qualifier associated with the at least one pattern.
-
71. The method of claim 37, wherein storing the unified DFA and the at least one NFA generated in the at least one memory includes generating a binary image including the unified DFA and the at least one NFA generated.
-
72. The method of claim 37, wherein the at least one processor includes a DFA co-processor and an NFA co-processor configured as an acceleration unit to offload DFA and NFA run time processing, respectively.
-
38. The method of claim 37, wherein the at least one heuristic includes maximizing a number of unique subpatterns selected and length of each subpattern selected.
-
73. A non-transitory computer-readable medium having stored thereon a sequence of instructions which, when loaded and executed by a processor operatively coupled to at least one memory and at least one network interface in a security appliance, causes the processor to:
-
select a subpattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic; generate a unified deterministic finite automata (DFA) using the subpatterns selected from all patterns in the set; generate at least one non-deterministic finite automata (NFA) for at least one pattern in the set, a portion of the at least one pattern used for generating the at least one NFA, and at least one walk direction selected from a reverse and forward walk direction for run time processing of the at least one NFA, being determined based on whether a length of the subpattern selected from the at least one pattern is fixed or variable and a location of the subpattern selected within the at least one pattern; and store the unified DFA and the at least one NFA generated in the at least one memory for run time processing by the at least one processor with a payload received via the at least one network interface, to determine pattern matches in the payload prior to forwarding the payload, the subpatterns selected based on the at least one heuristic to minimize a number of false positives identified in the at least one NFA to reduce the run time processing of the at least one processor.
-
Specification
- Resources
-
Current AssigneeMarvell Asia Pte Limited (Marvell Technology Group Limited)
-
Original AssigneeCavium, LLC (Marvell Technology Group Limited)
-
InventorsBilla, Satyanarayana Lakshmipathi, Goyal, Rajan, Dikshit, Abhishek
-
Primary Examiner(s)Hirl, Joseph P
-
Assistant Examiner(s)MURPHY, JOSEPH B
-
Application NumberUS14/015,248Publication NumberTime in Patent Office1,089 DaysField of SearchUS Class Current1/1CPC Class CodesH04L 63/0245 Filtering by information in...H04L 63/1408 by monitoring network traff...