Machine-learned approach to determining document relevance for search over large electronic collections of documents
First Claim
1. A computer-implemented system that facilitates a machine-learned approach to determine document relevance, comprising:
- a storage component that receives a set of human or machine selected items to be employed as positive test cases; and
a training component that trains at least one classifier with the human or machine selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, the trained classifier is employed to filter documents obtained from statistical-based or probabilistic-based searches.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.
-
Citations
30 Claims
-
1. A computer-implemented system that facilitates a machine-learned approach to determine document relevance, comprising:
-
a storage component that receives a set of human or machine selected items to be employed as positive test cases; and a training component that trains at least one classifier with the human or machine selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, the trained classifier is employed to filter documents obtained from statistical-based or probabilistic-based searches. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer-based information retrieval system, comprising:
-
means for determining a training set for data terms; means for automatically classifying the training set; means for determining new items from the classified training set; and means for presenting the new items in accordance with an information retrieval request. - View Dependent Claims (23)
-
-
24. A computer-implemented method to facilitate automated information retrieval, comprising:
-
processing n queries from a data log, n being an integer; identifying relevant candidates from the n queries; and training classifiers to identify other relevant candidates for subsequent search activities. - View Dependent Claims (25, 26, 27, 28, 29)
-
-
30. A computer readable medium having a data structure stored thereon, comprising:
-
a first data field related to a training data set for a relevance category; a second data field that relates to a new set of data items pertaining to the relevance category; and a third data field that relates to a probability ranking for the new set of data items.
-
Specification