×

Set similarity selection queries at interactive speeds

  • US 7,921,100 B2
  • Filed: 01/02/2008
  • Issued: 04/05/2011
  • Est. Priority Date: 01/02/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for calculating a similarity score of a query set comprising a query set of tokens and a first database set comprising a first database set of tokens, wherein the first database set is one of a plurality of database sets in a data collection set stored on a non-transitory computer readable medium, comprising the steps of:

  • for each specific token in the query set, determining the number of database sets that contain the specific token;

    for each specific token in the query set, calculating an inverse document frequency (idf) weight, based at least in part on the number of database sets that contain the specific token and on the total number of database sets in the data collection set;

    calculating a normalized length of the first database set;

    calculating a normalized length of the query set; and

    ,calculating a similarity score based at least in part on the normalized length of the first database set, the normalized length of the query set, and the idf weight of each of the tokens in the query set.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×