×

Document retrieval system and search method using word set and character look-up tables

  • US 6,741,985 B2
  • Filed: 07/31/2001
  • Issued: 05/25/2004
  • Est. Priority Date: 03/12/2001
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of matching a search string according to a predetermined set of matching criteria to a set of words contained in a collection of words, comprising:

  • creating and storing a lexicon containing the collection of words and associating each of the stored words with a unique identifying number;

    creating and storing a word look-up table identifying sets of word numbers associated with words of the lexicon that have a common set of characteristics;

    assigning the word numbers to the words of the lexicon such that each of the word number sets identified by the word look-up table consists of consecutive numbers;

    creating and storing a character look-up table identifying for a specified word number and a specified character whether the word associated with the specified word number contains the specified character and wherein the character look-up table is a two-dimensional boolean array with one dimension corresponding to character values and the other dimension corresponding to word numbers;

    selecting from the word look-up table a target set of word numbers whose associated words have a set of characteristics corresponding to the search string;

    refining the target set, the refining comprising selecting a set of characters from the search string, accessing the character look-up table to identify which of the selected characters are contained in each of the words associated with the target set, and in response to the character identification excluding from the target set those word numbers whose associated words do not contain a predetermined number of the selected characters;

    comparing each of the words associated with the refined target set directly with the search string and excluding from the target set any word number whose associated word fails to match the search string according to the predetermined set of matching criteria;

    each element of the array consists of a single bit, and the elements of the array corresponding to any character value of the one dimension are stored side-by-side in a row;

    the selecting of the target set comprises composing a set of consecutive word numbers from one or more word sets identified by the word look-up table;

    the accessing of the character look-up table comprises generating a boolean match value for each of the word numbers in the target set which indicates whether all of the selected characters are contained in the word associated with the word number; and

    , the generating of the boolean values comprising performing a logical AND operation that combines sections of each of the rows of the character look-up table corresponding to the selected characters and to the word numbers of the target set thereby to produce a resulting bit string in which each bit is associated with one of the word numbers of the target set and contains the boolean value for the associated word number, the performing of the logical AND operation comprising simultaneously combining n-bit segments of the row sections;

    whereby, the boolean match values are generated simultaneously for n words associated with the target set.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×