SYSTEMS AND METHODS OF UTF-8 PATTERN MATCHING
First Claim
Patent Images
1. A method for case insensitive searching of a variable width encoded pattern in a block of text, the method comprising:
- (a) determining, by a device, for each character in a pattern for which to search for a match within a block of text, a corresponding lower case Unicode value, the pattern comprising variable-width encoded characters;
(b) establishing, by the device, an index table of jump values for the pattern, the index table comprising a hash to each corresponding lower case Unicode value that identifies a number of byte lengths for the corresponding character;
(c) jumping, by the device responsive to the index table of jump values, a pointer to the block of text to a pivot element in the block of text based on a byte length of the pattern and the byte length of a last character of the pattern; and
(d) comparing, by the device, a lower case Unicode value of the pivot element to the corresponding lower case Unicode value of the character of the last character of the pattern.
8 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described for efficiently processing, searching and/or rewriting variable width encoded data, such as UTF-8 encoded data, will be described. Embodiments of the systems and methods modify and adapt search algorithms, such as the Horspool and Wu-Manber algorithms, to efficiently process and manage searching of variable width encoded text in large blocks of text, such as text that may be carried via a stream of packets thru a network device, such as an intermediary device.
-
Citations
20 Claims
-
1. A method for case insensitive searching of a variable width encoded pattern in a block of text, the method comprising:
-
(a) determining, by a device, for each character in a pattern for which to search for a match within a block of text, a corresponding lower case Unicode value, the pattern comprising variable-width encoded characters; (b) establishing, by the device, an index table of jump values for the pattern, the index table comprising a hash to each corresponding lower case Unicode value that identifies a number of byte lengths for the corresponding character; (c) jumping, by the device responsive to the index table of jump values, a pointer to the block of text to a pivot element in the block of text based on a byte length of the pattern and the byte length of a last character of the pattern; and (d) comparing, by the device, a lower case Unicode value of the pivot element to the corresponding lower case Unicode value of the character of the last character of the pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12)
-
-
11. A method for simultaneously performing case insensitive searches of variable width encoded patterns in a block of text, the method comprising:
-
(a) converting, by a device, each of the patterns to be searched within a block of text to a corresponding lower case pattern, each pattern comprising variable-width encoded characters; (b) establishing, by the device for each of the patterns, a shift table comprising a hash of a predetermined number of bytes of the corresponding lower case pattern and a jump value; (c) jumping, by the device responsive to the shift table, a pointer to a pivot block in the block of text; (d) identifying, by the device, an encoded string within the pivot block that comprises bytes from the predetermined number of bytes of the pivot block; (e) computing, by the device, a hash of the bytes of the lower case of the encoded string corresponding to the predetermined number of bytes; and (f) obtaining, by the device using the hash of the bytes, the jump value from the shift table. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification