Method and apparatus for score normalization for information retrieval applications
First Claim
Patent Images
1. A method facilitated by a human annotator and performed in a computer environment for normalizing a score associated with a document, the method comprising the steps of:
- (a) establishing (1) through the computer environment a set of training documents most of which are believed not to be relevant to a topic (off-topic) and (2) through the human annotator a query relevant to the topic (on-topic);
(b) assigning, through the computer environment, a training document relevance score to each one of the training documents, each training document relevance score representing a measure of relevance of its respective document to the topic;
(c) determining, through the computer environment, statistics relating to all training document relevance scores and thereby obtaining determined statistics;
(d) receiving a testing document;
(e) calculating, through the computer environment, a score of relevance of the testing document to the topic to obtain a testing document relevance score;
(f) normalizing, through the computer environment and based on the statistics, the testing document relevance score to obtain a normalized score wherein;
normalizing adjusts the testing document relevance score based on the statistics to be comparable to other scores from which the statistics were determined, andthe normalized score is a better predictor of probability of the testing document being relevant than the testing document relevant score;
(g) establishing, through the computer environment, a threshold score representing a relevance threshold for the topic;
(h) comparing the normalized score to the threshold score to obtain a comparison; and
(i) designating the testing document as relevant or not relevant to the topic based on the comparison.
11 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for normalizing a score associated with a document is presented. Statistics relating to scores assigned to a set of training documents not relevant to a topic are determined. Scores represent a measure of relevance to the topic. After the various statistics have been collected, a score assigned to a testing document is normalized based on those statistics. The normalized score is then compared to a threshold score. Subsequently, the testing document is designated as relevant or not relevant to the topic based on the comparison.
-
Citations
29 Claims
-
1. A method facilitated by a human annotator and performed in a computer environment for normalizing a score associated with a document, the method comprising the steps of:
-
(a) establishing (1) through the computer environment a set of training documents most of which are believed not to be relevant to a topic (off-topic) and (2) through the human annotator a query relevant to the topic (on-topic); (b) assigning, through the computer environment, a training document relevance score to each one of the training documents, each training document relevance score representing a measure of relevance of its respective document to the topic; (c) determining, through the computer environment, statistics relating to all training document relevance scores and thereby obtaining determined statistics; (d) receiving a testing document; (e) calculating, through the computer environment, a score of relevance of the testing document to the topic to obtain a testing document relevance score; (f) normalizing, through the computer environment and based on the statistics, the testing document relevance score to obtain a normalized score wherein; normalizing adjusts the testing document relevance score based on the statistics to be comparable to other scores from which the statistics were determined, and the normalized score is a better predictor of probability of the testing document being relevant than the testing document relevant score; (g) establishing, through the computer environment, a threshold score representing a relevance threshold for the topic; (h) comparing the normalized score to the threshold score to obtain a comparison; and (i) designating the testing document as relevant or not relevant to the topic based on the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage medium containing instructions for performing a method in a computer environment for normalizing a score associated with a document, the method facilitated by a human annotator comprising:
-
(a) establishing (1) through the computer environment a set of training documents most of which are believed not to be relevant to a topic (off-topic) and (2) through the human annotator a query relevant to the topic (on-topic); (b) assigning, through the computer environment, a training document relevance score to each one of the training documents, each training document relevance score representing a measure of relevance of its respective document to the topic; (c) determining, through the computer environment, statistics relating to all training document relevance scores; (d) receiving a testing document; (e) calculating, through the computer environment, a score of relevance of the testing document to the topic to obtain a testing document relevance score; (f) normalizing, through the computer environment and based on the statistics, the testing document relevance score to obtain a normalized score wherein; normalizing adjusts the testing document relevance score based on the statistics to be comparable to other scores from which the statistics were determined, and the normalized score is a better predictor of probability of the testing document being relevant than the testing document relevant score; (g) establishing, through the computer environment, a threshold score representing a relevance threshold for the topic; (h) comparing the normalized score to the threshold score to obtain a comparison; and (i) designating the testing document as relevant or not relevant to the topic based on the comparison. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method, facilitated by a human annotator and performed in a computer environment for normalizing a score associated with a document, the method comprising the steps of:
-
(a) receiving (1) through the computer environment a set of training documents not relevant to a topic (off-topic) and (2) through the human annotator a query including the topic (on-topic); (b) assigning, through the computer environment, a training document relevance score to each one of the training documents, each training document relevance score representing a measure of relevance of its respective document to the topic; (c) determining, through the computer environment, statistics relating to all training document relevance scores; (d) receiving a testing document; (e) calculating, through the computer environment, a score of relevance of the testing document to the topic to obtain a testing document relevance score; and (f) normalizing, through the computer environment and based on the statistics, the testing document relevance score to obtain a normalized score wherein; normalizing adjusts the testing document relevance score based on the statistics to be comparable to other scores from which the statistics were determined, and the normalized score is a better predictor of probability of the testing document being relevant than the testing document relevant score. - View Dependent Claims (27, 28)
-
-
29. A method facilitated by a human annotator and performed by a processor in a computer environment for searching for documents relevant to a topic comprising the steps of:
-
establishing through the computer environment a set of training documents not relevant to the topic (off-topic); the human annotator sending a query including the topic (on-topic) to the processor; and the human annotator receiving results from the processor indicating a document relevant to the topic, wherein the processor; assigns, through the computer environment, a training document relevance score to each one of the training documents, each training document relevance score representing a measure of relevance of its respective document to the topic; determines, through the computer environment, statistics relating to all training document relevance scores; receives a testing document; calculates, through the computer environment, a score of relevance of the testing document to the topic to obtain a testing document relevance score;
normalizes, through the computer environment and based on the statistics, the testing document relevance score to obtain a normalized score wherein;normalizing adjusts the testing document relevance score based on the statistics to be comparable to other scores from which the statistics were determined, and the normalized score is a better predictor of probability of the testing document being relevant than the testing document relevant score;
establishes, through the computer environment, a threshold score representing a relevance threshold for the topic;
compares the normalized score to the threshold score to obtain a comparison; anddesignates the testing document as relevant or not relevant to the topic based on the comparison.
-
Specification