Method for performing a search of a plurality of documents for similarity to a plurality of query words
First Claim
1. A computer implemented method for performing a search of a plurality of documents for similarity to a query having multiple terms using an inverted index of the plurality of documents, the method comprising the sequential steps of:
- (a) searching the inverted index for each of the query terms and determining a number of occurrences of the query terms in a first document;
(b) calculating a similarity score for the first document based on the number of occurrences of the query terms in the first document;
(c) searching the inverted index for each of the query terms and determining a number of occurrences of the query terms in a subsequent document;
(d) calculating a similarity score for the subsequent document based on the number of occurrences of the query terms in the subsequent document; and
(e) repeating steps (c) and (d) until a similarity score has been calculated for each of the plurality of documents.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for performing a search of a plurality of documents for similarity to a query word includes retrieving a first document, and determining a number of occurrences of the at least one query word in the first document. Then, a next document is retrieved and a number of occurrences of the at least one query word in the next document is determined. The steps are repeated until each of the plurality of documents have been retrieved, and the number of occurrences of the at least one query word has been determined in each of the plurality of documents. The query word can include a plurality of query words, all of which are searched in each document, in turn, rather than being searched word by word in the whole collection of documents. The documents are then ranked according to the number of occurrences of the query words determined in each document, and a list of documents is produced according to the document ranking.
176 Citations
20 Claims
-
1. A computer implemented method for performing a search of a plurality of documents for similarity to a query having multiple terms using an inverted index of the plurality of documents, the method comprising the sequential steps of:
-
(a) searching the inverted index for each of the query terms and determining a number of occurrences of the query terms in a first document; (b) calculating a similarity score for the first document based on the number of occurrences of the query terms in the first document; (c) searching the inverted index for each of the query terms and determining a number of occurrences of the query terms in a subsequent document; (d) calculating a similarity score for the subsequent document based on the number of occurrences of the query terms in the subsequent document; and (e) repeating steps (c) and (d) until a similarity score has been calculated for each of the plurality of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer implemented method for performing a search of a plurality of documents for similarity to a query having multiple terms using an inverted index of the plurality of documents, comprising the sequential steps of:
-
(a) searching the inverted index for each of the query terms and determining a number of occurrences of the query terms in a first document; (b) calculating a similarity score for the first document based on the number of occurrences of the query terms in the first document; (c) storing the first document'"'"'s similarity score in a memory; (d) searching the inverted index for each of the query terms and determining a number of occurrences of the query terms in a subsequent document; (e) calculating a similarity score for the subsequent document based on the number of occurrences of the query terms in the subsequent document; (f) storing the subsequent document'"'"'s similarity score in the memory when fewer than a predetermined number of similarity scores are stored in the memory; (g) comparing the subsequent document'"'"'s similarity score to the similarity scores stored in the memory, deleting the lowest similarity score stored in the memory, and storing the subsequent document'"'"'s similarity score in the memory when the subsequent document'"'"'s similarity score is higher than the lowest similarity score stored in the memory and more than a predetermined number of similarity scores are stored in the memory; and (h) repeating steps (d) through (g) until a similarity score has been calculated for each of the plurality of documents. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer implemented method for performing a search of a corpus of documents for similarity to a query having multiple terms using an inverted index of the corpus of documents, the method comprising the sequential steps of:
-
(a) retrieving a first portion of the inverted index corresponding to a first plurality of documents from a first memory into a second memory; (b) searching the first portion of the inverted index for each of the query terms and determining a number of occurrences of the query terms in each of the documents in the first portion of the inverted index; (c) calculating similarity scores for each of the documents in the first portion of the inverted index based on the number of occurrences of the query terms in the documents; (d) retrieving a subsequent portion of the inverted index corresponding to a subsequent plurality of documents from the first memory into the second memory; (e) searching the subsequent portion of the inverted index for each of the query terms and determining a number of occurrences of the query terms in each of the documents in the subsequent portion of the inverted index; (f) calculating similarity scores for each of the documents in the subsequent portion of the inverted index based on the number of occurrences of the query terms in the documents; and (g) repeating steps (d) through (f) until a similarity score has been calculated for each of the documents in the corpus of documents. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification