Term-statistics modification for category-based search
First Claim
1. A method for searching a document collection that includes a plurality of documents that are respectively associated with one or more categories and contain terms, the method comprising:
- providing an index of the terms indicating the documents in which the terms appear;
estimating a first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories over the documents in the collection;
accepting a query comprising one or more of the terms and a category restriction referring to at least one of the categories;
operating on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction, so as to produce a modified term distribution; and
applying the query to the index so as to return a response in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for searching a document collection includes providing an index of terms indicating the documents in which the terms appear. A first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories are estimated a over the documents in the collection. A query including one or more of the terms and a category restriction referring to at least one of the categories is accepted. A modified term distribution is produced by operating on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction. The query is applied to the index so as to return a response, in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.
75 Citations
27 Claims
-
1. A method for searching a document collection that includes a plurality of documents that are respectively associated with one or more categories and contain terms, the method comprising:
-
providing an index of the terms indicating the documents in which the terms appear;
estimating a first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories over the documents in the collection;
accepting a query comprising one or more of the terms and a category restriction referring to at least one of the categories;
operating on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction, so as to produce a modified term distribution; and
applying the query to the index so as to return a response in which occurrences of the at least one of the terms are scored responsively to the modified term distribution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. Apparatus for searching a document collection, comprising:
-
a memory, which is arranged to store a plurality of documents that are respectively associated with one or more categories and contain terms;
a search processor, which is arranged to provide an index of the terms indicating the documents in which the terms appear, to estimate a first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories over the documents in the collection, to accept a query comprising one or more of the terms and a category restriction referring to at least one of the categories, to operate on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction, so as to produce a modified term distribution, and to apply the query to the index so as to return a response in which occurrences of the at least one of the terms are scored responsively to the modified term distribution. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
- 19. A computer software product for searching a document collection that includes a plurality of documents that are respectively associated with one or more categories and contain terms, the product comprising a computer-readable medium, in which program instructions are stored, which instructions, when read by the computer, cause the computer to store an index of the terms indicating the documents in which the terms appear, to estimate a first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories over the documents in the collection, to accept a query comprising one or more of the terms and a category restriction referring to at least one of the categories, to operate on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction, so as to produce a modified term distribution, and to apply the query to the index so as to return a response in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.
Specification