Searching for sequences of character data
First Claim
1. A computer program product comprising a computer program operable to control a computer to search within a sequence of character data for occurrence of any of a plurality strings each formed of a predetermined sequence of characters, said computer program comprising:
- character identifying logic operable to identify a target character at a target search position within said sequence of character data;
string identifying logic operable to compare character data preceding said target character with data indicative of any of said plurality of strings that terminate with said target character to identify an occurrence of any of said plurality of strings that terminate with said target character; and
target search position advancing logic operable to advance said target search position within said sequence of character data by a predetermined number of character positions equal to a smallest number of character positions separating said target character from a last character within any of said plurality of strings.
9 Assignments
0 Petitions
Accused Products
Abstract
A modified Boyer-Moore searching algorithm used within an E-mail filtering system for detecting the presence of a plurality of target band strings during a single traversal of the character data to be searched. A single jump table for the combined set of strings for which a search is being made is used. A hierarchical match table starting with the possible terminating letters of strings for which a search is being made is traversed to identify any strings as they are encountered.
117 Citations
24 Claims
-
1. A computer program product comprising a computer program operable to control a computer to search within a sequence of character data for occurrence of any of a plurality strings each formed of a predetermined sequence of characters, said computer program comprising:
-
character identifying logic operable to identify a target character at a target search position within said sequence of character data;
string identifying logic operable to compare character data preceding said target character with data indicative of any of said plurality of strings that terminate with said target character to identify an occurrence of any of said plurality of strings that terminate with said target character; and
target search position advancing logic operable to advance said target search position within said sequence of character data by a predetermined number of character positions equal to a smallest number of character positions separating said target character from a last character within any of said plurality of strings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10)
(i) a node higher in said hierarchy corresponding to a character succeeding said match character in one or more of said plurality of strings; and
(ii) a node lower in said hierarchy corresponding to a character preceding said match character in one or more of said plurality of strings.
-
-
7. A computer program product as claimed in claim 6, wherein at a lowest level in said hierarchy traced along a path through said hierarchy a lowest node specifies at least one of said plurality of strings.
-
8. A computer program product as claimed in claim 7, wherein said lowest node specifies a longest character length of any string specified by said lowest node such that said string identifying logic can read a number of characters corresponding to said longest character length from said sequence of character data to be compared with any string specified by said lowest node.
-
10. A method as claimed in claim 1, wherein said steps of identifying, comparing and advancing operate in a repeating sequence until said target search position reaches an end of said sequence of character data.
-
9. A method of searching within a sequence of character data for occurrence of any of a plurality strings each formed of a predetermined sequence of characters, said method comprising the steps of:
-
identifying a target character at a target search position within said sequence of character data;
comparing character data preceding said target character with data indicative of any of said plurality of strings that terminate with said target character to identify an occurrence of any of said plurality of strings that terminate with said target character; and
advancing said target search position within said sequence of character data by a predetermined number of character positions equal to a smallest number of character positions separating said target character from a last character within any of said plurality of strings. - View Dependent Claims (11, 12, 13, 14, 15, 16)
(i) a node higher in said hierarchy corresponding to a character succeeding said match character in one or more of said plurality of strings; and
(ii) a node lower in said hierarchy corresponding to a character preceding said match character in one or more of said plurality of strings.
-
-
15. A method as claimed in claim 14, wherein at a lowest level in said hierarchy traced along a path through said hierarchy a lowest node specifies at least one of said plurality of strings.
-
16. A method as claimed in claim 15, wherein said lowest node specifies a longest character length of any string specified by said lowest node such that said string identifying logic can read a number of characters corresponding to said longest character length from said sequence of character data to be compared with any string specified by said lowest node.
-
17. Apparatus for processing data operable to search within a sequence of character data for occurrence of any of a plurality strings each formed of a predetermined sequence of characters, said apparatus comprising:
-
a character identifier operable to identify a target character at a target search position within said sequence of character data;
a string identifier operable to compare character data preceding said target character with data indicative of any of said plurality of strings that terminate with said target character to identify an occurrence of any of said plurality of strings that terminate with said target character; and
a target search position advancer operable to advance said target search position within said sequence of character data by a predetermined number of character positions equal to a smallest number of character positions separating said target character from a last character within any of said plurality of strings. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
(i) a node higher in said hierarchy corresponding to a character succeeding said match character in one or more of said plurality of strings; and
(ii) a node lower in said hierarchy corresponding to a character preceding said match character in one or more of said plurality of strings.
-
-
23. Apparatus as claimed in claim 22, wherein at a lowest level in said hierarchy traced along a path through said hierarchy a lowest node specifies at least one of said plurality of strings.
-
24. Apparatus as claimed in claim 23, wherein said lowest node specifies a longest character length of any word specified by said lowest node such that said string identifying logic can read a number of characters corresponding to said longest character length from said sequence of character data to be compared with any string specified by said lowest node.
Specification