Automatic index term augmentation in document retrieval
First Claim
Patent Images
1. A medium storing instructions executable by at least one processor, the instructions configured to cause the at least one processor to:
- create a search query comprised of at least one term in a specific document;
apply the search query to a collection of documents;
select from the collection of documents a subset of documents, the subset of documents achieving the highest scores upon application of the search query;
select at least one term for use as at least one index term for the specific document from among terms in the subset of documents based upon the co-occurrence of terms in the subset of documents with terms in the specific document.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods and systems for automatically assigning index terms to electronic documents such as Web pages or sites in a manner which may be used to facilitate the retrieval of electronic documents of interest. The method involves determining co-occurrences of terms in other documents with the electronic document, and selecting terms as index terms based upon those scores. The method permits the efficient retrieval of electronic documents.
181 Citations
9 Claims
-
1. A medium storing instructions executable by at least one processor, the instructions configured to cause the at least one processor to:
-
create a search query comprised of at least one term in a specific document; apply the search query to a collection of documents; select from the collection of documents a subset of documents, the subset of documents achieving the highest scores upon application of the search query; select at least one term for use as at least one index term for the specific document from among terms in the subset of documents based upon the co-occurrence of terms in the subset of documents with terms in the specific document. - View Dependent Claims (2, 3)
-
-
4. A medium storing instructions executable by at least one processor, the instructions configured to cause the at least one processor to:
-
select one or more index terms from a plurality of index terms; identify one or more documents of a plurality of documents to which each of the one or more index terms has been assigned; compare, for each of the one or more index terms, each of the identified documents to a specific document; determine a score for each of the one or more index terms based on the comparing; and assign the index term associated with the highest score to the specific document.
-
-
5. A computer-implemented method comprising:
-
selecting one or more index terms from a plurality of index terms; identifying one or more documents of a plurality of documents to which each of the one or more index terms has been assigned; comparing, for each of the one or more index terms, each of the identified documents to a specific document; determining a score for each of the one or more index terms based on the comparing; assigning the index term associated with the highest score to the specific document; selecting a first category; selecting one or more second categories assigned to a first supercategory; comparing the first category and the one or more second categories; computing a first score for the first supercategory based on the comparisons; selecting one or more third categories assigned to a second supercategory; comparing the first category and the one or more third categories; computing a second score for the second supercategory based on the comparisons; assigning the first category to the first supercategory when the first score is higher than the second score; assigning the first category to the second supercategory when the second score is higher than the first score; wherein the comparing of each of the identified documents to a specific document and the comparing the first category, comparing of the first category and the one or more second categories, comparing the one or more second categories, and the assigning the index term associated with the highest score to the specific document each include computing a likelihood ratio. - View Dependent Claims (6, 7, 8, 9)
-
Specification