High speed data searching for information in a computer system
First Claim
1. A method of searching information in a computer system comprising the steps of:
- identifying a parent string and a child string from data input, and a first value that represents the number of initial characters that said parent string and said child string have in common;
comparing a subsequent string with said parent string to generate a second value that represents the number of initial characters that said parent and said subsequent string have in common;
examining said first and second values to identify the extent of the comparison between said subsequent string and said child string;
identifying a next string for comparison with said subsequent string when said first value is not equal to said second value;
selecting a grandchild string using a first branch of a dictionary tree associated with said child string when said first value is greater than said second value.
10 Assignments
0 Petitions
Accused Products
Abstract
Efficiencies in searching and matching information in a computer system are achieved using embodiments of the invention. The invention can be used, for example, to build and utilize a dictionary of data for string replication compression. The data matching mechanism can also be applied to other situations where it is necessary to find a sequence of data in a data buffer (e.g. looking for a particular series of words, letters, or numbers in an online document). As a result of processing a current string using the data dictionary, it is possible to find a previously-processed dictionary string that has the greatest number of initial characters in common with the current string, and a location at which the current string can be inserted into the dictionary tree. A count field is used to improve the speed of searching for matched strings.
-
Citations
16 Claims
-
1. A method of searching information in a computer system comprising the steps of:
-
identifying a parent string and a child string from data input, and a first value that represents the number of initial characters that said parent string and said child string have in common; comparing a subsequent string with said parent string to generate a second value that represents the number of initial characters that said parent and said subsequent string have in common; examining said first and second values to identify the extent of the comparison between said subsequent string and said child string; identifying a next string for comparison with said subsequent string when said first value is not equal to said second value; selecting a grandchild string using a first branch of a dictionary tree associated with said child string when said first value is greater than said second value. - View Dependent Claims (3)
-
-
2. A method of searching information in a computer system comprising the steps of:
-
identifying a parent string and a child string from data input, and a first value that represents the number of initial characters that said parent string and said child string have in common; comparing a subsequent string with said parent string to generate a second value that represents the number of initial characters that said parent and said subsequent string have in common; examining said first and second values to identify the extent of the comparison between said subsequent string and said child string; identifying a next string for comparison with said subsequent string when said first value is not equal to said second value; selecting a grandchild string using a second branch of a dictionary tree associated with said child string when said first value is less than said second value.
-
-
4. An article of manufacture comprising:
a computer usable medium having computer readable program code embodied therein for searching information in a computer system comprising; computer readable program code configured to cause a computer to identify a parent string and a child string from data input, and a first value that represents the number of initial characters that said parent string and said child string have in common; computer readable program code configured to cause a computer to compare a subsequent string with said parent string to generate a second value that represents the number of initial characters that said parent and said subsequent string have in common; computer readable program code configured to cause a computer to examine said first and second values to identify the extent of the comparison between said subsequent string and said child string; computer readable program code configured to cause a computer to identify a next string for comparison with said subsequent string when said first value is not equal to said second value; computer readable program code configured to cause a computer to select a grandchild string using a first branch of a dictionary tree when said first value is greater than said second value. - View Dependent Claims (6)
-
5. An article of manufacture comprising:
a computer usable medium having computer readable program code embodied therein for searching information in a computer system comprising; computer readable program code configured to cause a computer to identify a parent string and a child string from data input, and a first value that represents the number of initial characters that said parent string and said child string have in common; computer readable program code configured to cause a computer to compare a subsequent string with said parent string to generate a second value that represents the number of initial characters that said parent and said subsequent string have in common; computer readable program code configured to cause a computer to examine said first and second values to identify the extent of the comparison between said subsequent string and said child string; computer readable program code configured to cause a computer to identify a next string for comparison with said subsequent string when said first value is not equal to said second value; computer readable program code configured to cause a computer to select a grandchild string using a second branch of said dictionary tree when said first value is less than said second value.
-
7. A method of searching information in a computer system comprising the steps of:
-
creating a first array of pointers to strings contained in an input buffer; creating a second array of pointers to strings that include a high occurrence sequence;
said step comprising the steps of;determining whether a first sequence of bits of one of said strings matches said high occurrence sequence; accessing an entry in said second array using a second sequence of bits of said one of said strings; determining whether a pointer exists in said entry of said second array; and storing in said entry a pointer to said one of said strings, if a pointer does not exist in said entry; accessing said first array of pointers to identify a previous string, if a current string does not contain said high occurrence sequence; accessing said second array of pointers to identify said previous string, if said current string does contain said high occurrence sequence; comparing said previous string and said current string to determine whether a match exists; and generating a compressed representation that includes a pointer to said previous string, if a match does exist. - View Dependent Claims (8, 9, 10)
-
-
11. An article of manufacturing comprising:
a computer usable medium having computer readable program code embodied therein for searching information comprising; computer readable program code configured to cause a computer to create a first array of pointers to strings contained in an input buffer; computer readable program code configured to cause a computer to create a second array of pointers to strings that include a high occurrence sequence; computer readable program code configured to cause a computer to access said first array of pointers to identify a previous string, if a current string does not contain said high occurrence sequence; computer readable program code configured to cause a computer to access said second array of pointers to identify said previous string, if said current string does contain said high occurrence sequence; computer readable program code configured to cause a computer to compare said previous string and said current string to determine whether a match exists; computer readable program code configured to cause a computer to generate a compressed representation that includes a pointer to said previous string, if a match does exist; computer readable program code configured to cause a computer to determine whether a first sequence of bits of said current string matches said high occurrence sequence; computer readable program code configured to cause a computer to access an entry in said second array using a second sequence of bits of said current string; and computer readable program code configured to cause a computer to retrieve from said entry a pointer to said previous string.
-
12. An article of manufacturing comprising:
a computer usable medium having computer readable program code embodied therein for searching information comprising; computer readable program code configured to cause a computer to create a first array of pointers to strings contained in an input buffer; computer readable program code configured to cause a computer to create a second array of pointers to strings that include a high occurrence sequence; computer readable program code configured to cause a computer to access said first array of pointers to identify a previous string, if a current string does not contain said high occurrence sequence; computer readable program code configured to cause a computer to access said second array of pointers to identify said previous string, if said current string does contain said high occurrence sequence; computer readable program code configured to cause a computer to compare said previous string and said current string to determine whether a match exists; computer readable program code configured to cause a computer to generate a compressed representation that includes a pointer to said previous string, if a match does exist; computer readable program code configured to cause a computer to identify another previous string; and computer readable program code configured to cause a computer to compare said another previous string and said current string to determine the number of initial characters in said another previous string and said current string. - View Dependent Claims (13, 14, 15)
-
16. An article of manufacturing comprising:
a computer usable medium having computer readable program code embodied therein for searching information comprising; computer readable program code configured to cause a computer to create a first array of pointers to strings contained in an input buffer; computer readable program code configured to cause a computer to create a second array of pointers to strings that include a high occurrence sequence; computer readable program code configured to cause a computer to access said first array of pointers to identify a previous string, if a current string does not contain said high occurrence sequence; computer readable program code configured to cause a computer to access said second array of pointers to identify said previous string, if said current string does contain said high occurrence sequence; computer readable program code configured to cause a computer to compare said previous string and said current string to determine whether a match exists; and computer readable program code configured to cause a computer to generate a compressed representation that includes a pointer to said previous string, if a match does exist, wherein said compressed representation includes a header field, said header field having a variable length and identifying the type for an associated content field.
Specification