Phase-based personalization of searches in an information retrieval system
First Claim
Patent Images
1. A method of personalizing a search of a document collection to a user, the method comprising:
- monitoring a plurality of documents accessed by the user;
identifying a plurality of first phrases present in one or more of the accessed documents;
receiving a query from the user, the query including one or more second phrases;
selecting a plurality of documents selected from the document collection that are responsive to the query;
identifying one or more of the first phrases as related phrases that are related to the one or more second phrases, wherein a particular related phrase is related to a particular second phrase when an information gain exceeds a threshold, the information gain being a ratio of an actual co-occurrence rate of the particular related phrase and the particular second phrase in documents of the document collection and an expected occurrence rate of the particular related phrase and the particular second phrase in the documents;
weighting a plurality of scores, each score corresponding to a respective document of the plurality of selected documents responsive to the query, wherein the score of the respective document that includes the one or more related phrases is boosted by the weighting;
ranking the plurality of selected documents for presentation to the user based on their corresponding weighted scores, to provide personalized search results; and
presenting the personalized search results to the user.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
210 Citations
15 Claims
-
1. A method of personalizing a search of a document collection to a user, the method comprising:
-
monitoring a plurality of documents accessed by the user; identifying a plurality of first phrases present in one or more of the accessed documents; receiving a query from the user, the query including one or more second phrases; selecting a plurality of documents selected from the document collection that are responsive to the query; identifying one or more of the first phrases as related phrases that are related to the one or more second phrases, wherein a particular related phrase is related to a particular second phrase when an information gain exceeds a threshold, the information gain being a ratio of an actual co-occurrence rate of the particular related phrase and the particular second phrase in documents of the document collection and an expected occurrence rate of the particular related phrase and the particular second phrase in the documents; weighting a plurality of scores, each score corresponding to a respective document of the plurality of selected documents responsive to the query, wherein the score of the respective document that includes the one or more related phrases is boosted by the weighting; ranking the plurality of selected documents for presentation to the user based on their corresponding weighted scores, to provide personalized search results; and presenting the personalized search results to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer system for personalizing a search of a document collection to a user, comprising:
-
a processor; and memory storing instructions that, when executed by the processor, cause the computer system to; monitor a plurality of documents accessed by the user, identify a plurality of first phrases present in one or more of the accessed documents, receive a query from the user, the query including one or more second phrases, select a plurality of documents responsive to the query as a search result, identify one or more of the first phrases as related phrases that are related to the one or more second phrases, wherein a particular related phrase is related to a particular second phrase when an information gain exceeds a threshold, the information gain being a ratio of an actual co-occurrence rate of the particular related phrase and the particular second phrase in documents of the document collection and an expected occurrence rate of the particular related phrase and the particular second phrase in the documents, weight a plurality of scores, each score corresponding to a respective document in the plurality of documents responsive to the query, wherein the score of the respective document that includes the one or more related phrases is boosted, rank the plurality of documents responsive to the query for presentation to the user according to the weighted scores to provide personalized search results, and present the personalized search results to the user. - View Dependent Claims (12, 13, 14, 15)
-
Specification