Method for Determining Document Relevance
First Claim
Patent Images
1. A computer-implemented method of determining the relevance, to a given word or phrase, of a document from a source collection of documents, the method comprising:
- accessing a predetermined set of words and/or phrases that are related to the given word or phrase; and
calculating a document relevance score as a function of;
whether the word or phrase occurs in the document; and
for each word and phrase from the predetermined set, whether the related word or phrase occurs in the document.
0 Assignments
0 Petitions
Accused Products
Abstract
The relevance of a document to a given word or phrase is determined by calculating a function of whether the word or phrase occurs in the document and whether each member of a set of words or phrases related to the given word or phrase occurs in the document. A phrases may be included in this set if, out of all the documents in a collection that contain all the words of the phrase, the proportion of documents containing the phrase is greater than a predetermined value. Document relevance can be used to search for a document.
51 Citations
47 Claims
-
1. A computer-implemented method of determining the relevance, to a given word or phrase, of a document from a source collection of documents, the method comprising:
-
accessing a predetermined set of words and/or phrases that are related to the given word or phrase; and calculating a document relevance score as a function of; whether the word or phrase occurs in the document; and for each word and phrase from the predetermined set, whether the related word or phrase occurs in the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 38)
-
-
33. A computer-implemented method of building a database of phrases occurring in a phrase-analysis document collection, comprising, for each of a plurality of sequences of consecutive words:
-
determining whether, out of all the documents in the phrase-analysis collection that contain all the words of the sequence, the proportion of documents containing the sequence consecutively is greater than a predetermined value; and including the sequence in the database only if said determination is made. - View Dependent Claims (34, 35, 36, 37, 39, 40, 41, 42, 43)
-
-
44. Data-processing apparatus for determining the relevance, to a given word or phrase, of a document from a source collection of documents, comprising:
-
apparatus configured to access a predetermined set of words and/or phrases that are related to the given word or phrase; and logic configured to calculate a document relevance score as a function of; whether the word or phrase occurs in the document; and for each word and phrase from the predetermined set, whether the related word or phrase occurs in the document.
-
-
45. Data-processing apparatus for building a database of phrases occurring in a phrase-analysis document collection comprising:
-
logic configured to determine, for each of a plurality of sequences of consecutive words, whether, out of all the documents in the phrase-analysis collection that contain all the words of the sequence, the proportion of documents containing the sequence consecutively is greater than a predetermined value; and logic configured to include the sequence in the database only if said determination is made.
-
-
46. A machine-readable storage device storing a computer program comprising instructions operable to cause a data-processing apparatus to determine the relevance, to a given word or phrase, of a document from a source collection of documents, by:
-
accessing a predetermined set of words and/or phrases that are related to the given word or phrase; and calculating a document relevance score as a function of; whether the word or phrase occurs in the document; and for each word and phrase from the predetermined set, whether the related word or phrase occurs in the document.
-
-
47. A machine-readable storage device storing a computer program comprising instructions operable to cause a data-processing apparatus to build a database of phrases occurring in a phrase-analysis document collection, by, for each of a plurality of sequences of consecutive words:
-
determining whether, out of all the documents in the phrase-analysis collection that contain all the words of the sequence, the proportion of documents containing the sequence consecutively is greater than a predetermined value; and including the sequence in the database only if said determination is made.
-
Specification