Training a ranking component
First Claim
Patent Images
1. A method of training a ranking component configured to rank text passages retrieved from a corpus based on a factoid type selection input and based on a textual input, the method comprising:
- accessing a training corpus, on a computer readable data storage medium, storing training data including a predefined set of factoid-based queries and documents, wherein each factoid-based query of the predefined set comprises content and an associated factoid type indicator, separate from the content, directly identifying an associated factoid type sought by the associated factoid-based query, the factoid-based query requesting at least one of individual types of information and categories of information;
identifying a plurality of factoid types to be indexed and, for each factoid type of the plurality of factoid types constructing, with a processor, a plurality of passages from the documents stored in the training corpus by identifying expressions of the factoid type in the training corpus and extracting text for the identified expressions;
matching, with the processor, the predefined set of factoid-based queries against the documents in the training corpus, the matching being performed by matching the predefined set of factoid-based queries against the constructed passages using the factoid type indicators, stored in the training data, associated with the predefined set of factoid-based queries;
calculating an accuracy measure based on how closely the constructed passages match a type and content of a factoid-based query in the predefined set of factoid-based queries; and
training, with the processor, the ranking component based on the accuracy measure indicative of how accurately the types and contents of factoid-based queries in the predefined set of factoid-based queries match against the constructed passages in the training corpus, the ranking component being trained to rank passages in the documents based on how closely the passages match a user-input query and user-specified factoid type.
1 Assignment
0 Petitions
Accused Products
Abstract
A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
54 Citations
20 Claims
-
1. A method of training a ranking component configured to rank text passages retrieved from a corpus based on a factoid type selection input and based on a textual input, the method comprising:
-
accessing a training corpus, on a computer readable data storage medium, storing training data including a predefined set of factoid-based queries and documents, wherein each factoid-based query of the predefined set comprises content and an associated factoid type indicator, separate from the content, directly identifying an associated factoid type sought by the associated factoid-based query, the factoid-based query requesting at least one of individual types of information and categories of information; identifying a plurality of factoid types to be indexed and, for each factoid type of the plurality of factoid types constructing, with a processor, a plurality of passages from the documents stored in the training corpus by identifying expressions of the factoid type in the training corpus and extracting text for the identified expressions; matching, with the processor, the predefined set of factoid-based queries against the documents in the training corpus, the matching being performed by matching the predefined set of factoid-based queries against the constructed passages using the factoid type indicators, stored in the training data, associated with the predefined set of factoid-based queries; calculating an accuracy measure based on how closely the constructed passages match a type and content of a factoid-based query in the predefined set of factoid-based queries; and training, with the processor, the ranking component based on the accuracy measure indicative of how accurately the types and contents of factoid-based queries in the predefined set of factoid-based queries match against the constructed passages in the training corpus, the ranking component being trained to rank passages in the documents based on how closely the passages match a user-input query and user-specified factoid type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer readable storage medium storing computer readable instructions which, when executed by a computer, cause the computer to perform a method of training a ranking component that ranks passages in a corpus relative to input passages based on a factoid type selection input, input by a user, the method comprising:
-
accessing a corpus storing passages and a set of factoid-based training queries, each of the factoid-based training queries in the set having content separate from an associated factoid type indicator that directly identifies a factoid type sought by the associated factoid-based training query; identifying the passages, having a possible factoid type corresponding to a factoid type directly identified by a factoid type indicator selected in the factoid type selection input, in the corpus based on the set of factoid-based training queries having the factoid type identified in the factoid type selection input; ranking the identified passages relative to each factoid-based training query based on a ranking score calculated by applying a weighting vector to a ranking function including a multi-dimensional feature vector; and using a processor of a computer to perform machine learning to learn desired values for components of the weighting vector based on an accuracy measure indicative of how accurately the ranking of the identified passages matches against the factoid types of the factoid-based training queries, such that the weighting vector and ranking function rank passages, identified based on a user input query and the factoid type selection input provided by the user, in an order indicative of a relationship between the passages and both content of the user input query and the factoid type indicated by the factoid type indicator selected by the user, wherein using a processor to perform machine learning comprises; setting the component values in the weighting vector corresponding to a given dimension in the feature vector to a desired value; and calculating the accuracy measure using the desired value of the component values of the weighting vector. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification