GLOBAL AND TOPICAL RANKING OF SEARCH RESULTS USING USER CLICKS

US 20110029517A1
Filed: 07/31/2009
Published: 02/03/2011
Est. Priority Date: 07/31/2009
Status: Abandoned Application

First Claim

Patent Images

1. A method comprising:

training, by at least one processor, a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising;

determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists;

determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query;

generating the relevance prediction model using the feature vector and label sets; and

obtaining, by the at least one processor and using the generated relevance prediction model, ranking predictions for documents in a result set of a query.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

To estimate, or predict, the relevance of items, or documents, in a set of search results, relevance information is extracted from user click data, and relational information among the documents as manifested by an aggregation of user clicks is determined from the click data. A supervised approach uses judgment information, such as human judgment information, as part of the training data used to generate a relevance predictor model, which minimizes the inherent noisiness of the click data collected from a commercial search engine.

60 Citations

View as Search Results

48 Claims

1. A method comprising:
- training, by at least one processor, a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising;
  
  determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists;
  
  determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query;
  
  generating the relevance prediction model using the feature vector and label sets; and
  
  obtaining, by the at least one processor and using the generated relevance prediction model, ranking predictions for documents in a result set of a query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, the label for a document comprising a human judge'"'"'s assessment of the document'"'"'s relevance to the query.
  - 3. The method of claim 1, the label for a document clicked on in the result set and positioned below another document not clicked on in the result set is based on a relative relevance determined in accordance with a skip above strategy, the relative relevance indicating that the clicked-on document positioned below the other document not clicked on is more relevant than the other document.
  - 4. The method of claim 1, the label for a document clicked on in the result set and positioned immediately above another document not clicked on in the result set is based on a relative relevance determined in accordance with a skip next strategy, the relative relevance indicating that the clicked-on document positioned immediately above the other document not clicked on is more relevant than the other document.
  - 5. The method of claim 1, the data for a query comprising data from a plurality of query sessions, each query session involving the query and having a result set of document and user click information, training a relevance prediction model further comprising:
    - aggregating the data from the plurality of query sessions for the query; and
      
      using the aggregated data to determine the feature vector and label sets for the query.
  - 6. The method of claim 1, the at least one other document is positioned immediately below the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 7. The method of claim 1, the at least one other document is positioned immediately above the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 8. The method of claim 1, the at least one other document is positioned below the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 9. The method of claim 1, the at least one other document is positioned above the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 10. The method of claim 1, generating the relevance prediction model using the feature vector and label sets further comprising:
    - generating the relevance prediction model using the feature vector and label sets using a global ranking training method.
  - 11. The method of claim 10, the global ranking training method comprises a conditional random fields training method.
  - 12. The method of claim 10, the global ranking training method comprises a sliding window training method.
  - 13. The method of claim 10, the global ranking training method comprises a recurrent window training method.
  - 14. The method of claim 10, the global ranking training method comprises a GBrank training method.
  - 15. The method of claim 1, the relevance prediction model comprises a plurality of topical relevance prediction models, each topical relevance prediction model corresponding to a category of queries.
  - 16. The method of claim 15, obtaining ranking predictions for documents in a result set of a query further comprising:
    - identifying, by the at least one processor, a category for the query;
      
      selecting, by the at least one processor, a topical relevance prediction model from the plurality based on the category identified for the query; and
      
      obtaining, by the at least one processor and using the selected topical relevance prediction model, ranking predictions for the documents in the result set of the query.

17. A system comprising:
- at least one server, the at least one server comprising;
  
  a training data generator that uses data for a plurality of queries to determine a plurality of feature vector sets and a plurality of label sets corresponding to the plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists, and a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query;
  
  a relevance predictor model generator that generates a relevance prediction model using the plurality of feature vector and label sets;
  
  a relevance predictor that obtains, using the generated relevance prediction model, ranking predictions for documents in a result set of a query.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 18. The system of claim 17, the label for a document comprising a human judge'"'"'s assessment of the document'"'"'s relevance to the query.
  - 19. The system of claim 17, the label for a document clicked on in the result set and positioned below another document not clicked on in the result set is based on a relative relevance determined in accordance with a skip above strategy, the relative relevance indicating that the clicked-on document positioned below the other document not clicked on is more relevant than the other document.
  - 20. The system of claim 17, the label for a document clicked on in the result set and positioned immediately above another document not clicked on in the result set is based on a relative relevance determined in accordance with a skip next strategy, the relative relevance indicating that the clicked-on document positioned immediately above the other document not clicked on is more relevant than the other document.
  - 21. The system of claim 17, the data for a query comprising data from a plurality of query sessions, each query session involving the query and having a result set of document and user click information, the training data generator:
    - aggregates the data from the plurality of query sessions for the query; and
      
      uses the aggregated data to determine the feature vector and label sets for the query.
  - 22. The system of claim 17, the at least one other document is positioned immediately below the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 23. The system of claim 17, the at least one other document is positioned immediately above the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 24. The system of claim 17, the at least one other document is positioned below the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 25. The system of claim 17, the at least one other document is positioned above the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 26. The system of claim 17, wherein the relevance predictor model generator generates the relevance prediction model using the feature vector and label sets using a global ranking training method.
  - 27. The system of claim 26, the global ranking training method comprises a conditional random fields training method.
  - 28. The system of claim 26, the global ranking training method comprises a sliding window training method.
  - 29. The system of claim 26, the global ranking training method comprises a recurrent window training method.
  - 30. The system of claim 26, the global ranking training method comprises a GBrank training method.
  - 31. The system of claim 17, the relevance prediction model comprises a plurality of topical relevance prediction models, each topical relevance prediction model corresponding to a category of queries.
  - 32. The system of claim 31, the relevance predictor:
    - identifies a category for the query;
      
      selects a topical relevance prediction model from the plurality based on the category identified for the query; and
      
      obtains, using the selected topical relevance prediction model, ranking predictions for the documents in the result set of the query.

33. A computer-readable medium tangibly storing thereon computer-executable process steps, the process steps comprising:
- training a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising;
  
  determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists;
  
  determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query;
  
  generating the relevance prediction model using the feature vector and label sets; and
  
  obtaining, using the generated relevance prediction model, ranking predictions for documents in a result set of a query.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
- - 34. The medium of claim 33, the label for a document comprising a human judge'"'"'s assessment of the document'"'"'s relevance to the query.
  - 35. The medium of claim 33, the label for a document clicked on in the result set and positioned below another document not clicked on in the result set is based on a relative relevance determined in accordance with a skip above strategy, the relative relevance indicating that the clicked-on document positioned below the other document not clicked on is more relevant than the other document.
  - 36. The medium of claim 33, the label for a document clicked on in the result set and positioned immediately above another document not clicked on in the result set is based on a relative relevance determined in accordance with a skip next strategy, the relative relevance indicating that the clicked-on document positioned immediately above the other document not clicked on is more relevant than the other document.
  - 37. The medium of claim 33, the data for a query comprising data from a plurality of query sessions, each query session involving the query and having a result set of document and user click information, the process step of training a relevance prediction model further comprising:
    - aggregating the data from the plurality of query sessions for the query; and
      
      using the aggregated data to determine the feature vector and label sets for the query.
  - 38. The medium of claim 33, the at least one other document is positioned immediately below the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 39. The medium of claim 33, the at least one other document is positioned immediately above the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 40. The medium of claim 33, the at least one other document is positioned below the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 41. The medium of claim 33, the at least one other document is positioned above the document in the result set, and the feature that relates the document to the at least one other document identifying whether user clicks exist in the click information for the document and the at least one other document in the click information.
  - 42. The medium of claim 33, the process step of generating the relevance prediction model using the feature vector and label sets further comprising:
    - generating the relevance prediction model using the feature vector and label sets using a global ranking training method.
  - 43. The medium of claim 42, the global ranking training method comprises a conditional random fields training method.
  - 44. The medium of claim 42, the global ranking training method comprises a sliding window training method.
  - 45. The medium of claim 42, the global ranking training method comprises a recurrent window training method.
  - 46. The medium of claim 42, the global ranking training method comprises a GBrank training method.
  - 47. The medium of claim 33, the relevance prediction model comprises a plurality of topical relevance prediction models, each topical relevance prediction model corresponding to a category of queries.
  - 48. The medium of claim 47, the process step of obtaining ranking predictions for documents in a result set of a query further comprising:
    - identifying a category for the query;
      
      selecting a topical relevance prediction model from the plurality based on the category identified for the query; and
      
      obtaining, using the selected topical relevance prediction model, ranking predictions for the documents in the result set of the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oath Inc. (Verizon Communications Inc.)
Original Assignee
Oath Inc. (Verizon Communications Inc.)
Inventors
Chang, Yi, Sun, Gordon Guo-Zheng, Ji, Shihao, Dong, Anlei, Zheng, Zhaohui, Liao, Ciya, Zha, Hongyuan, Chapelle, Olivier

Application Number

US12/533,564
Publication Number

US 20110029517A1
Time in Patent Office

Days
Field of Search
US Class Current

707/734
CPC Class Codes

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

GLOBAL AND TOPICAL RANKING OF SEARCH RESULTS USING USER CLICKS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

48 Claims

Specification

Use Cases

Quick Links

Others

GLOBAL AND TOPICAL RANKING OF SEARCH RESULTS USING USER CLICKS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

48 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others