×

Training a ranking component

  • US 7,783,629 B2
  • Filed: 01/05/2006
  • Issued: 08/24/2010
  • Est. Priority Date: 12/13/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of training a ranking component configured to rank text passages retrieved from a corpus based on a factoid type selection input and based on a textual input, the method comprising:

  • accessing a training corpus, on a computer readable data storage medium, storing training data including a predefined set of factoid-based queries and documents, wherein each factoid-based query of the predefined set comprises content and an associated factoid type indicator, separate from the content, directly identifying an associated factoid type sought by the associated factoid-based query, the factoid-based query requesting at least one of individual types of information and categories of information;

    identifying a plurality of factoid types to be indexed and, for each factoid type of the plurality of factoid types constructing, with a processor, a plurality of passages from the documents stored in the training corpus by identifying expressions of the factoid type in the training corpus and extracting text for the identified expressions;

    matching, with the processor, the predefined set of factoid-based queries against the documents in the training corpus, the matching being performed by matching the predefined set of factoid-based queries against the constructed passages using the factoid type indicators, stored in the training data, associated with the predefined set of factoid-based queries;

    calculating an accuracy measure based on how closely the constructed passages match a type and content of a factoid-based query in the predefined set of factoid-based queries; and

    training, with the processor, the ranking component based on the accuracy measure indicative of how accurately the types and contents of factoid-based queries in the predefined set of factoid-based queries match against the constructed passages in the training corpus, the ranking component being trained to rank passages in the documents based on how closely the passages match a user-input query and user-specified factoid type.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×