Semi-automatic index term augmentation in document retrieval
First Claim
Patent Images
1. A non-transitory machine-readable medium having executable instructions to cause one or more processing units to perform a method to assign terms to a first document, the method comprising:
- selecting the first document;
generating a query containing at least one term, from the selected first document;
applying the generated query to a plurality of documents to define a subset of the plurality of documents, wherein the defined subset of the plurality of documents constitutes those documents of the plurality of documents that contain the at least one term and meet a predetermined threshold of query relevance;
determining additional terms based on the defined subset of the plurality of documents, including;
determining a co-occurrence metric of the at least one term from the selected first document with each term of the defined subset of the plurality of documents,determining a frequency score using the determined co-occurrence metric for each term of the defined subset of the plurality of documents, andselecting a subset of terms of the defined subset of the plurality of documents based on the determined frequency score for each term;
assigning the selected subset of terms to the selected first document; and
storing the selected first document in a storage system that is remotely accessible via a network.
0 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods and systems for indexing or retrieving materials accessible through computer networks.
191 Citations
20 Claims
-
1. A non-transitory machine-readable medium having executable instructions to cause one or more processing units to perform a method to assign terms to a first document, the method comprising:
-
selecting the first document; generating a query containing at least one term, from the selected first document; applying the generated query to a plurality of documents to define a subset of the plurality of documents, wherein the defined subset of the plurality of documents constitutes those documents of the plurality of documents that contain the at least one term and meet a predetermined threshold of query relevance; determining additional terms based on the defined subset of the plurality of documents, including; determining a co-occurrence metric of the at least one term from the selected first document with each term of the defined subset of the plurality of documents, determining a frequency score using the determined co-occurrence metric for each term of the defined subset of the plurality of documents, and selecting a subset of terms of the defined subset of the plurality of documents based on the determined frequency score for each term; assigning the selected subset of terms to the selected first document; and storing the selected first document in a storage system that is remotely accessible via a network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for assigning terms to a first document, the apparatus comprising:
-
means for selecting the first document; means for generating a query containing at least one term, from the selected first document; means for applying the generated query to a plurality of documents to define a subset of the plurality of documents, wherein the defined subset of the plurality of documents constitutes those documents of the plurality of documents that contain the at least one term and meet a predetermined threshold of query relevance; means for determining additional terms based on the defined subset of the plurality of documents, wherein the means for determining additional terms includes; means for determining a co-occurrence metric of the at least one term from the selected first document with each term of the defined subset of the plurality of documents, means for determining a frequency score using the determined co-occurrence metric for each term of the defined subset of the plurality of documents, and means for selecting a subset of terms of the defined subset of the plurality of documents based on the determined frequency score for each term; means for assigning the selected subset of terms to the selected first document; and means for storing the selected first document in a storage system that is remotely accessible via a network. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification