Automatically ranking answers to database queries

US 7,251,648 B2
Filed: 06/28/2002
Issued: 07/31/2007
Est. Priority Date: 06/28/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method for automatically ranking data records by relevance to a query on a database the method comprising:

deriving a similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from at least one of data records in a database and a workload of queries, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;

applying the similarity function to a given query and records in the database to determine a query frequency between the given query and the records in the database;

ranking the records in the database based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and

returning the records in a ranked order.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for automatically ranking database records by relevance to a given query. A similarity function is derived from data in the database and/or queries in a workload. The derrived similarity function is applied to a given query and records it in the database to rank the records. The records are returned in a ranked order.

118 Citations

32 Claims

1. A computer implemented method for automatically ranking data records by relevance to a query on a database the method comprising:
- deriving a similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from at least one of data records in a database and a workload of queries, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a query frequency between the given query and the records in the database;
  
  ranking the records in the database based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 wherein the similarity function is derived only from at least one of data in the database and the workload of queries.
  - 3. The method of claim 1 wherein the given query is a conjunctive condition.
  - 4. The method of claim 1 wherein the similarity function corresponds to an inverse frequency of attribute values in records of the database.
  - 5. The method of claim 1 wherein the similarity function includes a cosine similarity between attributes specified in the given query and the database records.
  - 6. The method of claim 1 wherein the similarity function corresponds to a frequency an attribute value is specified in queries in a workload.
  - 7. The method of claim 1 wherein the similarity function corresponds to a frequency that a categorical attribute value is specified in queries in a workload and a frequency that a numeric attribute value and nearby numeric attribute values are specified in queries in the workload.
  - 8. The method of claim 1 wherein the similarity function assigns an importance weight to an attribute based on a frequency at which the attribute is specified by queries in the workload.
  - 9. The method of claim 1 wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the workload queries than corresponding attribute values in the second record.
  - 10. The method of claim 1 further comprising filtering the returned records using the given query when a condition of the given query includes an inflexible condition to remove records that do not satisfy the inflexible condition.
  - 11. The method of claim 1 wherein the given query is an inflexible conjunctive condition and the returned records are filtered using the conjunctive condition to remove records that do not satisfy the given query.
  - 12. The method of claim 1 wherein database records and the similarity function are provided to a top-K algorithm that returns a top-K number of records in the ranked order.
  - 13. The method of claim 12 wherein a threshold algorithm is used to return the top-K results.

14. A method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, the method comprising:
- deriving an inverse document frequency similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from the data records in a database, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.

15. A method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, and wherein the database has a given workload comprising a set of queries, the method comprising:
- deriving a query frequency similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from the queries in the workload, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.

16. A method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, and wherein the database has a given workload comprising a set of queries, the method comprising:
- deriving a similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ that corresponds to an inverse frequency of attribute values in records of the database and a frequency an attribute value is specified in queries in a workload, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.

17. A computer readable medium having computer executable instructions stored thereon for performing a method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, the method comprising:
- deriving a similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from at least one of data in a database and a workload of queries, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 18. The computer readable medium of claim 17 wherein the similarity function is derived only from at least one of data in the database and the workload of queries.
  - 19. The computer readable medium of claim 17 wherein the given query is a conjunctive condition.
  - 20. The computer readable medium of claim 17 wherein the similarity function corresponds to an inverse frequency of attribute values in records of the database.
  - 21. The computer readable medium of claim 17 wherein the similarity function includes a cosine similarity between attributes specified in the given query and the database records.
  - 22. The computer readable medium of claim 17 wherein the similarity function corresponds to a frequency an attribute value is specified in queries in a workload.
  - 23. The computer readable medium of claim 17 wherein the similarity function corresponds to a frequency that a categorical attribute value is specified in queries in a workload and a frequency that a numeric attribute value and nearby numeric attribute values are specified in queries in the workload.
  - 24. The computer readable medium of claim 17 wherein the similarity function assigns an importance weight to an attribute based on a frequency at which the attribute is specified by queries in the workload.
  - 25. The computer readable medium of claim 17 wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the workload queries than corresponding attribute values in the second record.
  - 26. The computer readable medium of claim 17 further comprising filtering the returned records using the given query when a condition of the given query includes an inflexible condition to remove records that do not satisfy the inflexible condition.
  - 27. The computer readable medium of claim 17 wherein the given query is an inflexible conjunctive condition and the returned records are filtered using the conjunctive condition to remove records that do not satisfy the given query.
  - 28. The computer readable medium of claim 17 wherein database records and the similarity function are provided to a top-K algorithm that returns a top-K number of records in the ranked order.
  - 29. The computer readable medium of claim 28 wherein a threshold algorithm is used to return the top-K results.

30. A computer readable medium having computer executable instructions stored thereon for performing, a method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, the method comprising:
- deriving an inverse document frequency similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from the data records in a database, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.

31. A computer readable medium having computer executable instructions stored thereon for performing, a method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, and wherein the database has a given workload comprising a set of queries, the method comprising:
- deriving a query frequency similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ from the queries in the workload, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes, S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.

32. A computer readable medium having computer executable instructions stored thereon for performing a method for automatically ranking data records by relevance to a query on a database wherein the database has data records arranged in one or more database tables, and wherein the database has a given workload comprising a set of queries, the method comprising:
- deriving a similarity function of the form $SIM (t, Q) = \sum_{k = 1}^{m} w_{k} \min_{v \in T_{k}} {S_{k} (t_{k}, v)}$ that corresponds to an inverse frequency of attribute values in records of the database and a frequency an attribute value is specified in queries in a workload, wherein t represents a tuple, Q represents a query, w represents an attribute weight, T_krepresents a set of constraints on values for categorical attributes or a range for numeric attributes S represents a similarity coefficient, and v represents a value of an attribute, wherein the similarity function corresponds to an inverse frequency of categorical attribute values in records of the database and an inverse frequency of numeric attribute values that is determined by considering a frequency of numeric attribute values specified in the given query and nearby numeric attribute values in the database;
  
  applying the similarity function to a given query and records in the database to determine a similarity between the given query and the records;
  
  ranking the records based on the similarity between the given query and the records, wherein the similarity function ranks a first record having a same similarity score as a second record higher than the second record when values in the first record for attributes that are not specified in the given query occur more frequently in the database than corresponding attribute values in the second record; and
  
  returning the records in a ranked order.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Chaudhuri, Surajit, Das, Gautam, Gionis, Aris
Primary Examiner(s)
Wong; Don
Assistant Examiner(s)
DANG, THANH HA T

Application Number

US10/186,027
Publication Number

US 20040002973A1
Time in Patent Office

1,859 Days
Field of Search

707/3, 707/5
US Class Current

707/749
CPC Class Codes

G06F 16/24578   using ranking

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Automatically ranking answers to database queries

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

118 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Automatically ranking answers to database queries

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

118 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links