Integrating external related phrase information into a phrase-based indexing information retrieval system
First Claim
Patent Images
1. A computer-implemented method for updating phrases associated with a limited document collection, comprising:
- determining, by at least one processor of a computer system, a list of top phrases for the limited document collection, at least in part based on presence, in documents of the document collection, of both the top phrases and related phrases of the top phrases;
receiving, by at least one processor of the computer system, a replacement phrase for at least one of the top phrases; and
updating, by at least one processor of the computer system, related phrase data for the replacement phrase from the related phrase data of the top phrase that is being replaced.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents, analyzing documents and storing the results of the analysis as phrase data. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Changes to existing phrase data about a document collection submitted by a user is captured and analyzed, and the existing phrase data is updated to reflect the additional knowledge gained through the analysis.
-
Citations
43 Claims
-
1. A computer-implemented method for updating phrases associated with a limited document collection, comprising:
-
determining, by at least one processor of a computer system, a list of top phrases for the limited document collection, at least in part based on presence, in documents of the document collection, of both the top phrases and related phrases of the top phrases; receiving, by at least one processor of the computer system, a replacement phrase for at least one of the top phrases; and updating, by at least one processor of the computer system, related phrase data for the replacement phrase from the related phrase data of the top phrase that is being replaced.
-
-
2. A computer-implemented method of determining top phrases of a limited document collection, comprising:
-
determining, by at least one processor of a computer system, top phrases for a plurality of documents in the limited document collection, wherein determining the top phrases of a document includes; identifying phrases in the document; for each identified phrase in the document, determining an importance score for the identified phrase based on occurrences of related phrases of the identified phrase, which are also in the document; associating each top phrase with the importance score for the document; for each top phrase, determining, by at least one processor of the computer system, an aggregate score of the top phrase for the limited document collection based on the top phrase'"'"'s scores for individual documents of the limited document collection in which the top phrase appears; and selecting, by at least one processor of the computer system, a set of top phrases with the highest aggregate scores. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
-
receiving, by at least one processor of a computer system, a user request to change a current top phrase for a limited document collection to a replacement top phrase; associating, by at least one processor of the computer system, the replacement top phrase with a root document of the document collection; associating, by at least one processor of a computer system, the current top phrase and the replacement top phrase with each other; adding, by at least one processor of the computer system, to phrase information for the replacement top phrase, phrase information for the current top phrase; and adding, by at least one processor of the computer system, to related phrase information of the replacement top phrase, related phrase information of the current top phrase. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
a tangible computer readable medium storing instructions that when executed by one or more processors cause the system to; determine top phrases for a plurality of documents in a limited document collection, wherein determining the top phrases of a document includes; identifying phrases in the document; for each identified phrase in the document, determining an importance score for the identified phrase based on occurrences of related phrases of the identified phrase, which are also in the document; associate each top phrase with the importance score for the document; for each top phrase, determine an aggregate score of the top phrase for the limited document collection based on the top phrase'"'"'s scores for individual documents of the limited document collection in which the top phrase appears; and select a set of top phrases with the highest aggregate scores; and one or more processors configured for executing the instructions stored on the computer readable storage medium. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A system comprising:
a tangible computer readable medium storing instructions that when executed by one or more processors cause the system to; receive a user request to change a current top phrase for a limited document collection to a replacement top phrase; associate the replacement top phrase with a root document of the document collection; associate the current top phrase and the replacement top phrase with each other; add to phrase information for the replacement top phrase, phrase information for the current top phrase; and add to related phrase information of the replacement top phrase, related phrase information of the current top phrase and one or more processors configured for executing the instructions stored on the computer readable storage medium. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
30. A tangible computer readable storage medium comprising instructions that when executed by one or more processors cause a computer system to:
determine top phrases for a plurality of documents in a limited document collection, wherein determining the top phrases of a document includes; identifying phrases in the document; for each identified phrase in the document, determining an importance score for the identified phrase based on occurrences of related phrases of the identified phrase, which are also in the document; associate each top phrase with the importance score for the document; for each top phrase, determine an aggregate score of the top phrase for the limited document collection based on the top phrase'"'"'s scores for individual documents of the limited document collection in which the top phrase appears; and select a set of top phrases with the highest aggregate scores. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
37. A tangible computer readable storage medium comprising instructions that when executed by one or more processors cause a computer system to:
-
receive a user request to change a current top phrase for a limited document collection to a replacement top phrase; associate the replacement top phrase with a root document of the document collection; associate the current top phrase and the replacement top phrase with each other; add to phrase information for the replacement top phrase, phrase information for the current top phrase; and add to related phrase information of the replacement top phrase, related phrase information of the current top. - View Dependent Claims (38, 39, 40, 41, 42, 43)
-
Specification