×

Method of performing approximate substring indexing

  • US 7,010,522 B1
  • Filed: 06/17/2002
  • Issued: 03/07/2006
  • Est. Priority Date: 06/17/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of indexing a query substring Q against a collection of data strings in a database D, the method comprising the steps of:

  • a) preprocessing each string σ

    in database D to generate a plurality of overlapping q-grams of a predetermined length q augmenting each q-gram with information indicating its position within string σ

    to form a tuple comprising the position information and the q-gram, and creating an index of the positional q-gram tuple;

    b) parsing the query substring Q into a plurality of overlapping positional q-grams of length q;

    c) searching each index in database D to retrieve potential matches between the query Q substring plurality of overlapping q-grams and the preprocessed database D plurality of overlapping q-grams, a potential match defined as having a predetermined number of matching overlapping q-grams;

    d) applying position-directed filtering to the potential matches retrieved in step c) to form a candidate set including only those potential matches with at least some q-grams in the same position order as substring query Qe) defining a predetermined maximum edit distance k between the query substring Q and database D;

    f) after applying the position-directed filtering, calculating the edit distance between each candidate substring and the query substring; and

    g) verifying the candidate set by removing from the candidate set each candidate substring having an edit distance greater than k.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×