High-speed single-pass textual search processor for locating exact and inexact matches of a search pattern in a textual stream
First Claim
1. A special-purpose search processor, comprising:
- a plurality of serially-connected cells, each cell includinga pattern register for storing a character of a pattern to be searched for,a character register for storing a character of a data stream to be searched,match logic, including a comparator for generating a cell match signal by comparing the character stored in the character register with the character stored in the pattern register, anda match register for storing a match value indicative of a match between a search pattern and the data stream, and for registering any mismatch indicated by the cell match signal;
means for connecting the character registers of the plurality of cells serially together to form a character line;
means for connecting the match registers of the plurality of cells serially together to form a match line;
means for initializing the cells to contain the search pattern; and
clock means for gating the data stream from cell to cell through the character line such that the search pattern and the data stream are oppositely oriented, and a first character of the search pattern is first encountered by a first character in the data stream;
and wherein characters in the search pattern are compared with characters in the data stream in a sequential manner, and match values are propagated along the match line in synchronism with movement of the data stream along the character line, to indicate exact and inexact matches between the search pattern and strings of characters in the data stream, and wherein inexact matches involve incorrect, missing or extra characters;
and wherein the match logic includesmeans for loading a tolerance value onto the match line, at a selected cell position in the search pattern, for transmission onto the match line as an initial match value,a delay match register connected to receive input from the match register and to provide output on the match line to a next cell in series, wherein the presence of the delay match register synchronizes movement of match values from cell to cell with movement of the data stream through successive potentially matching positions with respect to the search pattern,means interposed between the match register and the delay match register, for selectively decrementing the match value as it is transmitted to the delay match register, whenever a non-match is detected in a cell, wherein presence of a non-zero match value on the match line is indicative of a match,means for detecting missing characters in the data stream and adjusting the match value to record a degree of mismatch with the search pattern, wherein the means for detecting missing characters generates a first alternate match value based on an assumption that a character is missing from the data stream, and discards the first alternate match if the original match value is greater, andmeans for detecting extra characters in the data stream and adjusting the match value to record a degree of mismatch with the search pattern, wherein the means for detecting extra characters generates a second alternate match value based on an assumption that there is an extra character in the data stream, and discards the second alternate match if the original match value is greater.
2 Assignments
0 Petitions
Accused Products
Abstract
A high speed search processor capable of performing a wide variety of search functions, including simple and complex searches, either within an entire text stream or within predefined fixed or sliding windows in the text stream. The processor is made up of multiple interconnected cells, each of which has a pattern register for storing part of a pattern to be searched for, a character register for storing a character of the data stream to be searched, a match register for storing a match value indicative of a match between the search pattern and the text stream, and match logic for modifying an incoming match value in accordance with conditions within the cell. The data stream and the search pattern are oppositely oriented, such that a first character of the search pattern is first encountered by a first character in the data stream, and the pattern is successively compared with an equal number of characters in the data stream as it is moved through the search pattern. The match logic includes means for detecting missing and extra characters in the data stream. The processor can therefore tolerate incorrect, missing or extra characters in the text stream, and can handle multiple levels of nesting and arbitrary boolean expressions within the search pattern. Another novel aspect of the processor is its ability to locate an enumerated subset of search terms or patterns within fixed or sliding windows.
-
Citations
38 Claims
-
1. A special-purpose search processor, comprising:
-
a plurality of serially-connected cells, each cell including a pattern register for storing a character of a pattern to be searched for, a character register for storing a character of a data stream to be searched, match logic, including a comparator for generating a cell match signal by comparing the character stored in the character register with the character stored in the pattern register, and a match register for storing a match value indicative of a match between a search pattern and the data stream, and for registering any mismatch indicated by the cell match signal; means for connecting the character registers of the plurality of cells serially together to form a character line; means for connecting the match registers of the plurality of cells serially together to form a match line; means for initializing the cells to contain the search pattern; and clock means for gating the data stream from cell to cell through the character line such that the search pattern and the data stream are oppositely oriented, and a first character of the search pattern is first encountered by a first character in the data stream; and wherein characters in the search pattern are compared with characters in the data stream in a sequential manner, and match values are propagated along the match line in synchronism with movement of the data stream along the character line, to indicate exact and inexact matches between the search pattern and strings of characters in the data stream, and wherein inexact matches involve incorrect, missing or extra characters; and wherein the match logic includes means for loading a tolerance value onto the match line, at a selected cell position in the search pattern, for transmission onto the match line as an initial match value, a delay match register connected to receive input from the match register and to provide output on the match line to a next cell in series, wherein the presence of the delay match register synchronizes movement of match values from cell to cell with movement of the data stream through successive potentially matching positions with respect to the search pattern, means interposed between the match register and the delay match register, for selectively decrementing the match value as it is transmitted to the delay match register, whenever a non-match is detected in a cell, wherein presence of a non-zero match value on the match line is indicative of a match, means for detecting missing characters in the data stream and adjusting the match value to record a degree of mismatch with the search pattern, wherein the means for detecting missing characters generates a first alternate match value based on an assumption that a character is missing from the data stream, and discards the first alternate match if the original match value is greater, and means for detecting extra characters in the data stream and adjusting the match value to record a degree of mismatch with the search pattern, wherein the means for detecting extra characters generates a second alternate match value based on an assumption that there is an extra character in the data stream, and discards the second alternate match if the original match value is greater. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A special-purpose search processor, comprising:
-
a plurality of serially-connected cells, each cell including a pattern register for storing a character of a pattern to be search for, a character register for storing a character of a data stream to be searched, match logic, including a comparator for generating a cell match signal by comparing the character stored in the character register with the character stored in the pattern register, a match register for storing a match value indicative of a match between a search pattern and the data stream, and for registering any mismatch indicated by the cell match signal; an accumulator register for storing a binary value derived from an accumulator register of an adjacent cell, and accumulator logic for modifying a value contained in the accumulator register for output to a next cell, by combining the accumulator register with a selected condition of the cell, in accordance with a preselected logical function; means for connecting the character registers of the plurality of cells serially together to form a character line; means for connecting the match registers of the plurality of cells serially together to form a match line; means for connecting the accumulator registers of the plurality of cells serially together to form a serial accumulator line; means for initializing the cells to contain the search pattern; and clock means for gating the data stream from cell to cell through the character line; and wherein characters in the search pattern are compared with characters in the data stress in a sequential manner, and match values are propagated along the match line in synchronism with movement of the data stream along the character line, to indicate exact and inexact matches between the search pattern and strings of characters in the data stream, and wherein inexact matches involve incorrect, missing or extra characters; and further comprising a plurality of additional registers in each cell, corresponding registers being serially connected to form a plurality of additional lines through the cells for the storage and propagation of intermediate match signals; and wherein the match logic is configured to detect boundaries between predefined segments of text, and the match logic and accumulator logic cooperate to provide means for locating selected search patterns in combination within one of the predefined segments of text. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of searching for selected patterns in a text stream, using a plurality of serially-connected comparison cells, the method comprising:
-
initializing each cell to contain a pattern character in a pattern register and various control flags, the pattern characters of the cells together forming a serial string of characters defining a search pattern; applying a clocking signal to the cells to cause propagation of the text stream from cell to cell along a character line formed by a serial connection of character registers in the respective cells, wherein the search pattern and the text stream are oppositely oriented, and a first character of the search pattern is first encountered by a first character in the text stream; comparing the character register and the pattern register in each cell, on each application of the clocking signal, to generate cell match signals indicative of matching character and pattern registers; loading an initial match value, referred to as a tolerance value, onto a match line connecting the cells, at a selected cell location, the tolerance value being indicative of the degree of mismatch that will be tolerated between the text stream and the search pattern; propagating match values from cell to cell along the match line; selectively modifying the match value in each cell in response to the result of the comparing step; and outputting a match result indicative of the location of matching data in the text stream;
wherein the step of modifying the match value includes decrementing the value on detection of incorrect, missing or extra characters in the text stream;storing a first match result associated with a first search term of the search pattern on an additional line connecting the cells, wherein the storing step is performed in a cell corresponding to an end of the first search term; generating a second match result relating to a second search term of the search pattern, wherein the generating step produces the second match result from a cell corresponding to an end of the second search term; logically combining the first and second match results in a cell corresponding to a position downstream of the first and second search terms, for output from the serially connected cells; detecting boundaries of predefined text segments within the text stream, using cells of which the pattern registers contain selected text boundary characters; generating enabling signals in the cells in the step of detecting boundaries; and applying the enabling signals to other cells, thereby enabling search functions to be performed only within a fixed window of one text segment defined by the detected boundaries.
-
-
16. A method for searching a stream of text for a search pattern, comprising:
-
first storing a search pattern of textual characters in a serially connected plurality of comparison cells; shifting the text stream character-by-character through the comparison cells in synchronism with a clocking cycle, wherein the search pattern and the text stream are oppositely oriented, and a first character of the search pattern is first encountered by a first character in the text stream; comparing characters in the search pattern with characters in the text stream at each position of the text stream as it is shifted through the comparison cells; producing cell match signals in the comparison cells as a result of the comparing step; setting a match value initially to a selected tolerance value in a selected one of the cells; receiving a match value from a neighboring cell at the same time as a text stream character; decrementing the received match value if there is not cell match signal in the cell; storing the received and possibly decremented match value for one clocking cycle; then transmitting the stored match value to the next neighboring cell in sequence, wherein the match values are propagated through the comparison cells at half the rate of shifting the text stream through the comparison cells; detecting exact matches between the search pattern and strings of characters in the text stream, by detecting match values that are propagated out of the cells with a value not diminished from the tolerance value; and detecting inexact matches in the form of incorrect, extra or missing characters in each character string compared with the search pattern, wherein the detection of incorrect, extra and missing characters is performed simultaneously and in each cell, and further includes the steps of selectively decrementing a match value in each of three alternate paths corresponding to the detection of incorrect, extra and missing characters, and selecting the greatest match value of three for transmission to the next neighboring cell in sequence. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A special purpose search processor for searching a stream of text for specified search patterns, the processor comprising:
-
means for first storing a search pattern of textual characters in a serially connected plurality of comparison cells; means for shifting the text stream character-by-character through the comparison cells in synchronism with a clocking cycle, wherein the search pattern and the text stream are oppositely oriented, and a first character of the search pattern is first encountered by a first character in the text stream; means for comparing characters in the search pattern with characters in the text stream at each position of the text stream as it is shifted through the comparison cells, and producing cell match signals in the comparison cells; means for setting a match value initially to a selected tolerance value in a selected one of the cells; means for receiving a match value from a neighboring cell at the same time as a text stream character; means for decrementing the received match value if there is no cell match signal in the cell; means for storing the received and possibly decremented match value for one clocking cycle; means for transmitting the stored match value to the next neighboring cell in sequence, wherein the match values are propagated through the comparison cells at half the rate of shifting the text stream through the comparison cells wherein exact matches between the search pattern and strings of characters in the text stream are indicated by match values that are propagated out of the cells with a value not diminished from the tolerance value, and inexact matches are indicated by diminished match values; and means for detecting inexact matches in the form of extra or missing characters in each character string compared with the search pattern, wherein the detection of extra and missing characters is performed simultaneously with the detection of incorrect characters, and in each cell, and further includes means for computing two additional match values in each cell, based on the detection of an extra or missing character in the text stream, and means for selecting the greatest match value from the stored match value and the two additional match values, for transmission to the next neighboring cell in sequence. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A special-purpose search processor, comprising:
-
a plurality (n) of serially connected pattern registers for storing a search pattern; an equal plurality (n) of serially connected character registers, each for storing a character of a data stream to be searched; means for shifting the data stream being searched, character-by-character through the serially connected plurality of character registers; an equal plurality (n) of serially connected match registers, for storing match value input at a first of the match registers; means for shifting match values along the serially connected match registers, at half the rate at which the data stream characters are shifted through the character registers, whereby a match detected between a character in the data stream and a character in the search pattern is preserved in the form of a match value that is positioned for use in a subsequent comparison between the next following character in the data stream and the next following character in the search pattern; and match logic, including a plurality of comparators for comparing the contents of the character registers with the contents of the corresponding pattern registers, and modifying the contents of the corresponding match registers upon detection of a mismatch, whereby the match values passed through the plurality of match registers are indicative of exact and inexact matches between the search pattern and strings of characters in the data stream; and wherein the match logic includes means for detecting inexact matches in the form of extra or missing characters in each character string compared with the search pattern, wherein the detection of extra and missing characters is performed simultaneously with the detection of incorrect characters, and in each cell, and further includes means for computing two additional match values in each cell, based on the detection of an extra or missing character in the text stream, and means for selecting the greatest match value from the stored match value and the two additional match values, for transmission to the next neighboring cell in sequence.
-
Specification