Method and apparatus for machine learning a document relevance function
First Claim
1. A method for determining a document relevance function for estimating a relevance score of a document in a database with respect to a query, comprising:
- (a) collecting a respective result set of documents from the database for each of a plurality of test queries;
(b) for each test query of the plurality of test queries, selecting a subset of the documents in the respective result set; and
assigning a set of training relevance scores to the documents in the subset; and
(c) determining a relevance function based on the plurality of test queries, the subsets of documents, and the sets of training relevance scores.
14 Assignments
0 Petitions
Accused Products
Abstract
Provided is a method and computer program product for determining a document relevance function for estimating a relevance score of a document in a database with respect to a query. For each of a plurality of test queries, a respective set of result documents is collected. For each test query, a subset of the documents in the respective result set is selected, and a set of training relevance scores is assigned to documents in the subset. In one embodiment, at least some of the training relevance scores are assigned by human subjects who determine individual relevance scores for submitted documents with respect to the corresponding queries. Finally, a relevance function is determined based on the plurality of test queries, the subsets of documents, and the sets of training relevance scores.
113 Citations
56 Claims
-
1. A method for determining a document relevance function for estimating a relevance score of a document in a database with respect to a query, comprising:
-
(a) collecting a respective result set of documents from the database for each of a plurality of test queries;
(b) for each test query of the plurality of test queries, selecting a subset of the documents in the respective result set; and
assigning a set of training relevance scores to the documents in the subset; and
(c) determining a relevance function based on the plurality of test queries, the subsets of documents, and the sets of training relevance scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism therein, the computer program mechanism comprising:
-
(a) a collecting module for collecting a respective result set of documents from the database for each of a plurality of test queries;
(b) a sampling module for selecting, for each test query of the plurality of test queries, a subset of the documents in the respective result set;
(c) a scoring module for assigning a set of training relevance scores to the documents in each selected subset; and
(d) a relevance function generation module for determining a relevance function based on the plurality of test queries, the subsets of documents, and the sets of training relevance scores. - View Dependent Claims (30, 31, 32, 34, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56)
-
-
33. The computer program product of claim 33, wherein the collecting module further includes instructions for:
selecting combinations of two or more words sampled from the lexicon and assigning the combinations to the plurality of test queries.
-
35. The computer program product of claim 35, wherein the sampling module further includes instructions for:
assigning the selected document to each tier in the plurality of relevance tiers for which the surrogate relevance score is greater than a respective predetermined threshold value of the tier. - View Dependent Claims (36, 38, 39, 41, 42)
-
37. The computer program product of claim 37, wherein the respective predetermined ranges of relevance scores associated with the plurality of tiers are nonoverlapping.
-
40. The computer program product of claim 40, wherein the individual relevance scores are numbers selected from a predetermined range and said assigning comprises computing an arithmetic mean of the individual relevance scores.
-
51. The computer program product of claim 51, wherein the determining module further includes instructions for:
defining the partial error, the partial error defined at least in part by a ratio, the ratio being a ratio of the relevance function to a difference, the difference being a difference between one and the relevance function
Specification