Training a ranking component

US 7,783,629 B2
Filed: 01/05/2006
Issued: 08/24/2010
Est. Priority Date: 12/13/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method of training a ranking component configured to rank text passages retrieved from a corpus based on a factoid type selection input and based on a textual input, the method comprising:

accessing a training corpus, on a computer readable data storage medium, storing training data including a predefined set of factoid-based queries and documents, wherein each factoid-based query of the predefined set comprises content and an associated factoid type indicator, separate from the content, directly identifying an associated factoid type sought by the associated factoid-based query, the factoid-based query requesting at least one of individual types of information and categories of information;

identifying a plurality of factoid types to be indexed and, for each factoid type of the plurality of factoid types constructing, with a processor, a plurality of passages from the documents stored in the training corpus by identifying expressions of the factoid type in the training corpus and extracting text for the identified expressions;

matching, with the processor, the predefined set of factoid-based queries against the documents in the training corpus, the matching being performed by matching the predefined set of factoid-based queries against the constructed passages using the factoid type indicators, stored in the training data, associated with the predefined set of factoid-based queries;

calculating an accuracy measure based on how closely the constructed passages match a type and content of a factoid-based query in the predefined set of factoid-based queries; and

training, with the processor, the ranking component based on the accuracy measure indicative of how accurately the types and contents of factoid-based queries in the predefined set of factoid-based queries match against the constructed passages in the training corpus, the ranking component being trained to rank passages in the documents based on how closely the passages match a user-input query and user-specified factoid type.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.

54 Citations

View as Search Results

20 Claims

1. A method of training a ranking component configured to rank text passages retrieved from a corpus based on a factoid type selection input and based on a textual input, the method comprising:
- accessing a training corpus, on a computer readable data storage medium, storing training data including a predefined set of factoid-based queries and documents, wherein each factoid-based query of the predefined set comprises content and an associated factoid type indicator, separate from the content, directly identifying an associated factoid type sought by the associated factoid-based query, the factoid-based query requesting at least one of individual types of information and categories of information;
  
  identifying a plurality of factoid types to be indexed and, for each factoid type of the plurality of factoid types constructing, with a processor, a plurality of passages from the documents stored in the training corpus by identifying expressions of the factoid type in the training corpus and extracting text for the identified expressions;
  
  matching, with the processor, the predefined set of factoid-based queries against the documents in the training corpus, the matching being performed by matching the predefined set of factoid-based queries against the constructed passages using the factoid type indicators, stored in the training data, associated with the predefined set of factoid-based queries;
  
  calculating an accuracy measure based on how closely the constructed passages match a type and content of a factoid-based query in the predefined set of factoid-based queries; and
  
  training, with the processor, the ranking component based on the accuracy measure indicative of how accurately the types and contents of factoid-based queries in the predefined set of factoid-based queries match against the constructed passages in the training corpus, the ranking component being trained to rank passages in the documents based on how closely the passages match a user-input query and user-specified factoid type.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein identifying matching passages comprises:
    - for a selected constructed passage, generating a feature vector indicative of whether a content word in the selected constructed passage resides in a selected potentially matching passage, and whether a content word in the selected potentially matching passage resides in the selected constructed passage.
  - 3. The method of claim 2 wherein training the ranking component comprises:
    - calculating a ranking function corresponding to each constructed passage given a training query.
  - 4. The method of claim 3 wherein training the ranking component further comprises:
    - calculating a ranking score for each constructed passage based on the corresponding ranking function.
  - 5. The method of claim 4 wherein training the ranking component further comprises:
    - ranking each constructed passage relative to each training query.
  - 6. The method of claim 5 wherein the accuracy measure is indicative of an accuracy of the ranking of the constructed passages relative to a given training query and wherein training the ranking component further comprises:
    - applying a weighting vector to the ranking scores based on the accuracy measure to obtain desired ranking results for the given training query.
  - 7. The method of claim 6 wherein training the ranking component comprises:
    - learning the weighting vector based on the ranking of each constructed passage relative to the given training query and the accuracy measure.
  - 8. The method of claim 7 wherein learning the weighting vector comprises:
    - conducting machine learning by iterating through a plurality of different values for components of the weighting vector; and
      
      for each different value, calculating the accuracy measure.
  - 9. The method of claim 8 wherein learning the weighting vector comprises:
    - choosing values for the components of the weighting vector based on whether the accuracy measure indicates an improvement in accuracy that meets a threshold value.

10. A computer readable storage medium storing computer readable instructions which, when executed by a computer, cause the computer to perform a method of training a ranking component that ranks passages in a corpus relative to input passages based on a factoid type selection input, input by a user, the method comprising:
- accessing a corpus storing passages and a set of factoid-based training queries, each of the factoid-based training queries in the set having content separate from an associated factoid type indicator that directly identifies a factoid type sought by the associated factoid-based training query;
  
  identifying the passages, having a possible factoid type corresponding to a factoid type directly identified by a factoid type indicator selected in the factoid type selection input, in the corpus based on the set of factoid-based training queries having the factoid type identified in the factoid type selection input;
  
  ranking the identified passages relative to each factoid-based training query based on a ranking score calculated by applying a weighting vector to a ranking function including a multi-dimensional feature vector; and
  
  using a processor of a computer to perform machine learning to learn desired values for components of the weighting vector based on an accuracy measure indicative of how accurately the ranking of the identified passages matches against the factoid types of the factoid-based training queries, such that the weighting vector and ranking function rank passages, identified based on a user input query and the factoid type selection input provided by the user, in an order indicative of a relationship between the passages and both content of the user input query and the factoid type indicated by the factoid type indicator selected by the user, wherein using a processor to perform machine learning comprises;
  
  setting the component values in the weighting vector corresponding to a given dimension in the feature vector to a desired value; and
  
  calculating the accuracy measure using the desired value of the component values of the weighting vector.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 11. The computer readable medium of claim 10 wherein using machine learning comprises:
    - repeating steps of setting the component values to a desired value, ranking the identified passages relative to the factoid-based training queries, and calculating the accuracy measure for the ranking using the desired value, a plurality of times for each dimension in the feature vector.
  - 12. The computer readable medium of claim 10 wherein identifying the passages in the corpus comprises:
    - constructing the passages from text in the corpus based on the input passages and a factoid type selection input.
  - 13. The computer readable medium of claim 10 wherein matching comprises:
    - obtaining the training queries; and
      
      matching the factoid-based training queries against documents in a training corpus based on the identified passages.
  - 14. The computer readable medium of claim 10, wherein using a processor of a computer to perform machine learning comprises:
    - conducting machine learning by iterating through a plurality of different values for components of the weighting vector; and
      
      for each different value, calculating the accuracy measure.
  - 15. The computer readable medium of claim 10, wherein using a processor of a computer to perform machine learning comprises:
    - choosing values for the components of the weighting vector based on whether the accuracy measure indicates an improvement in accuracy that meets a threshold value.
  - 16. The computer readable medium of claim 10, and further comprising:
    - training a ranking component comprising calculating a ranking function corresponding to each passage given a training query.
  - 17. The computer readable medium of claim 16, wherein training the ranking component further comprises:
    - calculating a ranking score for each passage based on the corresponding ranking function.
  - 18. The computer readable medium of claim 17, wherein training the ranking component further comprises:
    - ranking each passage relative to each of a plurality of training queries.
  - 19. The computer readable medium of claim 18, wherein the accuracy measure is indicative of an accuracy of the ranking of the passages relative to a given training query and wherein training the ranking component further comprises:
    - applying the weighting vector to the ranking scores based on the accuracy measure to obtain desired ranking results for the given training query.
  - 20. The computer readable medium of claim 19, wherein training the ranking component comprises:
    - learning the weighting vector based on the ranking of each passage relative to the given training query and the accuracy measure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Hang, Cao, Yunbo, Gao, Jianfeng
Primary Examiner(s)
Rones; Charles
Assistant Examiner(s)
Mahmood; Rezwanul

Application Number

US11/326,283
Publication Number

US 20070136281A1
Time in Patent Office

1,692 Days
Field of Search

704/9, 707/3, 707/7, 707/102, 707/6, 707/101
US Class Current

707/723
CPC Class Codes

G06F 16/313 Selection or weighting of t...

Training a ranking component

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

54 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Training a ranking component

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links