Method and system for information retrieval with clustering
First Claim
1. A non-transitory computer readable medium having instructions stored thereon that cause a processor to retrieve documents in response to at least one search term from a user, the retrieving the documents comprising:
- receiving the search term for searching a plurality of text documents, wherein each text document is associated with one or more salient terms extracted from the document and each text document is associated with one or more properties that represent the one or more extracted salient terms;
retrieving a first set of retrieved documents from a query of the plurality of text documents, wherein each of the retrieved documents comprises the search term;
retrieving the associated salient terms for each of the retrieved documents and the associated properties;
grouping based on a distance metric the retrieved salient terms into one or more clusters of salient terms and providing the clusters of salient terms to the user, wherein each of the cluster of salient terms corresponds to one of the properties associated with the retrieved documents and each cluster displays the associated salient terms;
receiving a selection of a first cluster of the clusters of salient terms from the user, wherein the first cluster comprises first salient terms;
selecting a second set of retrieved documents from the first set of retrieved documents, wherein each second set document of the second set includes at least one of the first salient terms of the first cluster of salient terms;
retrieving associated second salient terms for each of the second set documents; and
grouping the second salient terms into one or more second clusters of salient terms and providing the second clusters of salient terms to the user.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems that enable searching with clustering in information access systems are described. The methods of clustering operate on a collection of materials wherein each item in the collection may be associated with one or more properties. An original subset of materials is selected from the collection and relevant properties associated with the subset of materials are clustered into property clusters. Each property cluster generally contains properties that are more similar to each other than to properties in a different property cluster. The property clusters can be used to respond to the query. A mapping function can be used to identify a set of materials that correspond to each property cluster based on the associations between individual items and properties. The property clusters can also be used for iterative query refinement.
255 Citations
33 Claims
-
1. A non-transitory computer readable medium having instructions stored thereon that cause a processor to retrieve documents in response to at least one search term from a user, the retrieving the documents comprising:
-
receiving the search term for searching a plurality of text documents, wherein each text document is associated with one or more salient terms extracted from the document and each text document is associated with one or more properties that represent the one or more extracted salient terms; retrieving a first set of retrieved documents from a query of the plurality of text documents, wherein each of the retrieved documents comprises the search term; retrieving the associated salient terms for each of the retrieved documents and the associated properties; grouping based on a distance metric the retrieved salient terms into one or more clusters of salient terms and providing the clusters of salient terms to the user, wherein each of the cluster of salient terms corresponds to one of the properties associated with the retrieved documents and each cluster displays the associated salient terms; receiving a selection of a first cluster of the clusters of salient terms from the user, wherein the first cluster comprises first salient terms; selecting a second set of retrieved documents from the first set of retrieved documents, wherein each second set document of the second set includes at least one of the first salient terms of the first cluster of salient terms; retrieving associated second salient terms for each of the second set documents; and grouping the second salient terms into one or more second clusters of salient terms and providing the second clusters of salient terms to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer implemented method for retrieving text documents, the method comprising:
-
receiving at least one search term for searching a plurality of text documents, wherein each text document is associated with one or more salient terms extracted from the document and each text document is associated with one or more properties that represent the one or more extracted salient terms; retrieving a first set of retrieved documents from a query of the plurality of text documents, wherein each of the retrieved documents comprises the search term; retrieving the associated salient terms for each of the retrieved documents and the associated properties; grouping based on a distance metric the retrieved salient terms into one or more clusters of salient terms and providing the clusters of salient terms to the user, wherein each of the cluster of salient terms corresponds to one of the properties associated with the retrieved documents and each cluster displays the associated salient terms; receiving a selection of a first cluster of the clusters of salient terms from the user, wherein the first cluster comprises first salient terms; selecting a second set of retrieved documents from the first set of retrieved documents, wherein each second set document of the second set includes at least one of the first salient terms of the first cluster of salient terms; retrieving associated second salient terms for each of the second set documents; and grouping the second salient terms into one or more second clusters of salient terms and providing the second clusters of salient terms to the user. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An information retrieval system comprising:
-
a processor coupled to a computer readable medium having instructions stored thereon, wherein the processor executing the instructions implements modules comprising; an interface module that receives at least one search term and searches a plurality of text documents, wherein each text document is associated with one or more salient terms extracted from the document and each text document is associated with one or more properties that represent the one or more extracted salient terms; a database access module that retrieve a first set of retrieved documents from a query of the plurality of text documents, wherein each of the retrieved documents comprises the search term; a matching module that retrieves the associated salient terms for each of the retrieved documents and the associated properties; a clustering module that groups based on a distance metric the retrieved salient terms into one or more clusters of salient terms and provides the clusters of salient terms to the user, wherein each of the cluster of salient terms corresponds to one of the properties associated with the retrieved documents and each cluster displays the associated salient terms; the interface module that receives a selection of a first cluster of the clusters of salient terms from the user, wherein the first cluster comprises first salient terms; the matching module selecting a second set of retrieved documents from the first set of retrieved documents, wherein each second set document of the second set includes at least one of the first salient terms of the first cluster of salient terms; further comprising retrieving associated second salient terms for each of the second set documents; and grouping the second salient terms into one or more second clusters of salient terms and providing the second clusters of salient terms to the user. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification