Computer system and method for data base indexing and information retrieval
First Claim
1. In a computer system comprising a CPU, an input/output terminal connected to said CPU, a main CPU memory and a secondary storage means containing a data base, a method of indexing individual records of said data base, and rapidly searching and retrieving selected records corresponding to one or more keywords input to said CPU, said method comprising the steps of:
- said CPU forming a vector for each said keyword, each said vector comprising one or more array elements which together comprise a numerically sorted list of all record numbers where the keyword for that vector is found;
said CPU transforming each said vector so as to form a data base index comprising a bit string for each said vector, said step of transforming each said vector comprising the steps of;
(a) transforming said numerically sorted list of record numbers into a binary matrix wherein each row of said matrix corresponds to a binary representation of one of said vector array elements, and wherein each column of said matrix corresponds to a level of said hierarchal tree;
(b) determining the first column of said matrix where both ones and zeros are present;
(c) grouping said ones and zeros to identify the number of bits in each such group;
(d) determining whether the first and last bit in each said group are both ones, are zero and one or both zeros, and outputting a "01," "11" or "10," respectively, so as to form one bit pair of said bit string;
(e) splitting the next column of said matrix into groups of bits based on the number of bits in each group determined in step (c);
(f) repeating steps (c) and (d) for each said group of said next column; and
(g) repeating steps (b) through (f) until each column of said matrix has been done.said CPU storing said data base index in said secondary storage means;
inputting at said input/output terminal at least one keyword;
said CPU searching said data base index and retrieving the bit string for said keyword input at said terminal;
said CPU transforming said retrieving bit string back into the vector for said input keyword; and
said CPU identifying at said input/output terminal the records of said data base identified by said list of record numbers associated with the vector for said input keyword.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer system and method for data base indexing and information retrieval. A number of keywords are selected and each record of a data base is searched to determine in which records each keyword appears. The central processing unit of the system then creates a vector for each keyword which identifies each record number of the data base where the keyword appears and numerically sorts the record numbers. A special bit processor next transforms each vector into a bit string that is identified by one of the keywords. The bit strings are returned to the central processing unit and are stored in secondary storage so as to form an index for the data base. To retrieve information, one or more keywords are input to the central processing unit. The input keywords are used by the central processing unit to identify the bit string for each keyword. The keywords may be logically joined using "AND," "OR" and/or "NOT" commands. Each bit string retrieved from the index is then sent to the special purpose bit processor, which combines the bit strings according to the particular command. The resultant bit string is transformed by the bit processor into a vector which is returned to the central processing unit and which then is used to identify the individual records which contain the combined keywords.
-
Citations
38 Claims
-
1. In a computer system comprising a CPU, an input/output terminal connected to said CPU, a main CPU memory and a secondary storage means containing a data base, a method of indexing individual records of said data base, and rapidly searching and retrieving selected records corresponding to one or more keywords input to said CPU, said method comprising the steps of:
-
said CPU forming a vector for each said keyword, each said vector comprising one or more array elements which together comprise a numerically sorted list of all record numbers where the keyword for that vector is found; said CPU transforming each said vector so as to form a data base index comprising a bit string for each said vector, said step of transforming each said vector comprising the steps of; (a) transforming said numerically sorted list of record numbers into a binary matrix wherein each row of said matrix corresponds to a binary representation of one of said vector array elements, and wherein each column of said matrix corresponds to a level of said hierarchal tree; (b) determining the first column of said matrix where both ones and zeros are present; (c) grouping said ones and zeros to identify the number of bits in each such group; (d) determining whether the first and last bit in each said group are both ones, are zero and one or both zeros, and outputting a "01," "11" or "10," respectively, so as to form one bit pair of said bit string; (e) splitting the next column of said matrix into groups of bits based on the number of bits in each group determined in step (c); (f) repeating steps (c) and (d) for each said group of said next column; and (g) repeating steps (b) through (f) until each column of said matrix has been done. said CPU storing said data base index in said secondary storage means; inputting at said input/output terminal at least one keyword; said CPU searching said data base index and retrieving the bit string for said keyword input at said terminal;
said CPU transforming said retrieving bit string back into the vector for said input keyword; andsaid CPU identifying at said input/output terminal the records of said data base identified by said list of record numbers associated with the vector for said input keyword. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of indexing, searching and retrieving individual records contained in a computer data base, said method comprising the steps of:
-
inputting to a computer a plurality of records to be stored as said data base, and assigning a record number to each said record, each said record number corresponding to a terminal node of a hierarchal tree; identifying a plurality of keywords contained in said records and listing for each keyword those record numbers where that keyword is found; forming a vector for each said keyword, each said vector containing one or more array elements which together comprise a numerically sorted list of all said record numbers which identify the records where the keyword for that vector is found; transforming each said vector by converting each said array element into a corresponding binary representation and thereafter deriving from said binary representations a bit string comprising a plurality of bit pairs, each bit pair representing a level and a node of a hierarchal tree, and each said bit string being identified by the keyword for the vector from which said bit string was formed; storing each said keyword and its associated bit string so as to form an index to said data base; inputting to said computer at least two keywords and command instructing said computer to logically join said keywords in a specified manner; searching said index and retrieving therefrom the bit strings identified by said keywords; merging said retrieved bit strings to form a resultant bit string for said logically joined keywords, said resultant bit string comprising a plurality of bit pairs, each bit pair representing a level and a node of said hierarchal tree; transforming said resultant bit string into a vector containing one or more array elements which together comprise a numerically sorted list of all record numbers for those records in which said logically joined keywords appear; and outputting from said computer an identification of all said records in which said logically joined keywords apear. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A computer system for indexing and rapidly searching and retrieving individual records contained in a data base, said system comprising:
-
a main CPU memory for controlling a CPU; a secondary storage means comprising; (a) a plurality of individual data base records each containing one or more keywords from which a vector for each said keyword is formed, each said vector comprising one or more array elements which together comprise a numerically sorted list of all record numbers which identify the terminal nodes of a hierarchal tree where the records containing the keyword for that vector are found; and (b) an index to said data base records comprising a plurality of randomly linked bit strings derived from said vectors, each said bit string being identified by one of said keywords and comprising a plurality of bit pairs, each said bit pair representing a level and a node of said hierarchal tree; an input/output terminal for entering one or more keywords to be logically joined; and a CPU adapted to (a) receive said entered keywords, (b) locate corresponding keywords in said index, (c) retrieve the bit string for each keyword found in said index, (d) combine the retrieved bit strings to form a new bit string, (e) transform the new bit string into a vector, and (f) output an identification of all records containing the logically joined keywords input at said terminal. - View Dependent Claims (35, 36, 37)
-
-
38. In a computer system comprising a CPU, an input/output terminal connected to said CPU, a main CPU memory and a secondary storage means containing a data base, a method of indexing individual records of said data base, and rapidly searching and retrieving selected records corresponding to one or more keywords input to said CPU, said method comprising the steps of:
-
inputting to said CPU a plurality of records to be stored as said data base, and assigning a record number to each said record, each said record number corresponding to a terminal node of a hierarchal tree; identifying a plurality of keywords contained in said records; said CPU forming a vector for each said keyword, each said vector containing one or more array elements which together comprise a numerically sorted list of all said record numbers which identify the records where the keyword for that vector is found; said CPU transforming each said vector by converting each said array element into a corresponding binary representation and thereafter deriving from said binary representations a bit string comprising a plurality of bit pairs, each bit pair representing a level and a node of said hierarchal tree, each said bit string being identified by the keyword for the vector from which said bit string was formed, and said bit strings being randomly linkned so as to form said hierarchal tree; said CPU storing said data base index in said secondary storage means in the form of said randomly linked bit strings which form said hierarchal tree; inputting at said input/output terminal at least one keyword; said CPU searching said data base index and retrieving the bit string for said keyword input at said terminal; said CPU transforming said retrieved bit string back into the vector for said keyword; and said CPU identifying at said input/output terminal the records of said data base associated with the vector for said keyword.
-
Specification