Semi-automatic index term augmentation in document retrieval
First Claim
Patent Images
1. A method implemented by one or more computing devices, wherein the one or more computing devices are configured to perform the following:
- selecting a first document;
generating a query containing at least one term from the selected first document;
applying the generated query to a plurality of documents to define a subset of the plurality of documents, wherein the defined subset of the plurality of documents constitutes those documents of the plurality of documents that contain the at least one term and meet a predetermined threshold of query relevance;
determining additional terms based on the defined subset of the plurality of documents, including;
determining a co-occurrence metric of the at least one term from the selected first document with each term of the defined subset of the plurality of documents;
determining a frequency score using the determined co-occurrence metric for each term of the defined subset of the plurality of documents; and
selecting a subset of terms of the defined subset of the plurality of documents based on the determined frequency score for each term;
assigning the selected subset of terms to the selected first document; and
storing the selected first document in a storage system that is remotely accessible via a network.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods and systems for indexing or retrieving materials accessible through computer networks.
-
Citations
14 Claims
-
1. A method implemented by one or more computing devices, wherein the one or more computing devices are configured to perform the following:
-
selecting a first document; generating a query containing at least one term from the selected first document; applying the generated query to a plurality of documents to define a subset of the plurality of documents, wherein the defined subset of the plurality of documents constitutes those documents of the plurality of documents that contain the at least one term and meet a predetermined threshold of query relevance; determining additional terms based on the defined subset of the plurality of documents, including; determining a co-occurrence metric of the at least one term from the selected first document with each term of the defined subset of the plurality of documents; determining a frequency score using the determined co-occurrence metric for each term of the defined subset of the plurality of documents; and selecting a subset of terms of the defined subset of the plurality of documents based on the determined frequency score for each term; assigning the selected subset of terms to the selected first document; and storing the selected first document in a storage system that is remotely accessible via a network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification