×

Method and apparatus for improving performance of approximate string queries using variable length high-quality grams

  • US 7,996,369 B2
  • Filed: 12/14/2008
  • Issued: 08/09/2011
  • Est. Priority Date: 11/14/2008
  • Status: Active Grant
First Claim
Patent Images

1. An improvement in an indexing method for efficient approximate string search of a query string s against a collection of data strings S corresponding to a gram dictionary D in a computer system comprising:

  • preprocessing the dictionary D into a plurality of grams of varying length between qmin and qmax;

    starting from a current position in the query string s, searching for the longest substring that matches a gram in the dictionary D, if no such gram exists in the dictionary D, then materializing a substring of length gmin starting from the current position;

    checking if the found or materialized substring is a positional substring already found in the query string s, and if so, then not producing a positional gram corresponding to the found or materialized substring, otherwise producing a positional gram corresponding to the found or materialized substring; and

    indexing the current position by one to the right in the query string and repeating the searching and checking until the current position in the query string S is greater than |s|−

    qmin+1, where |s| is the length of query string s, so that a gram index list for query string s having variable gram length is generated denoted as the set of positional grams VG(s, D, qmin, qmax).

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×