×

Suffix array candidate selection and index data structure

  • US 20120117076A1
  • Filed: 06/30/2011
  • Published: 05/10/2012
  • Est. Priority Date: 11/09/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying a candidate subset of a data set, the data set comprising a plurality of records structured with a data field, each record'"'"'s data field comprising a data field value, the data field value comprising a sequence of one or more unigrams, the method comprising:

  • recognizing a query field value, the query field value comprising a sequence of N unigrams beginning with U1 and ending with UN, wherein U symbolizes a unigram and N symbolizes a non-negative integer value; and

    performing a first step, a second step, a third step, and a fourth step of a candidate generation iterative loop, whereinthe first step comprises identifying a query field value suffix comprising a sequence of N−

    J unigrams beginning with U1+J and ending with UN, wherein J symbolizes a non-negative integer value less than N,the second step comprises identifying a qualifying subset of the data set, wherein each record in the qualifying subset satisfies a similarity criterion when the record'"'"'s data field value is compared to the query field value suffix,the third step comprises including, in the candidate subset, the identified qualifying subset records, andthe fourth step comprises if the number of records in the candidate subset is less than a satisfactory number of candidates, and if N−

    J is greater than a minimum suffix length, incrementing J and performing the first step, the second step, the third step, and the fourth step of the candidate generation iterative loop.

View all claims
  • 14 Assignments
Timeline View
Assignment View
    ×
    ×