System and method for providing search query refinements
First Claim
1. A method performed by a data processing apparatus, the method comprising:
- accessing, by the data processing apparatus, a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query;
receiving, by the data processing apparatus, a search query from a user device;
obtaining, by the data processing apparatus and in response to receiving the search query, search results, each search result referencing a resource and having a relevance score indicating a measure of relevance of the search result to the search query;
identifying, by the data processing apparatus, a subset of the resources referenced by the search results, each resource in the subset matching a resource that is identified by one or more of the query-resource pairings;
for each resource in the subset;
generating, by the data processing apparatus, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each of the query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension;
multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and
normalizing the term vector;
generating, by the data processing apparatus, one or more clusters of resources based on the term vectors of each resource in the subset;
generating, by the data processing apparatus and for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster;
selecting, by the data processing apparatus, as refinement clusters, n clusters with the highest rank score, where n is a positive integer;
generating, by the data processing apparatus, a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster;
generating, by the data processing apparatus and for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster;
for each refinement cluster, selecting, by the data processing apparatus, as a representative query for the refinement cluster, a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster;
selecting, by the data processing apparatus, as a set of candidate queries, the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; and
providing, by the data processing apparatus, one or more of the candidate queries to the user device.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for providing search query refinements are presented. A stored query and a stored document are associated as a logical pairing. A weight is assigned to the logical pairing. The search query is issued and a set of search documents is produced. At least one search document is matched to at least one stored document. The stored query and the assigned weight associated with the matching at least one stored document are retrieved. At least one cluster is formed based on the stored query and the assigned weight associated with the matching at least one stored document. The stored query associated with the matching at least one stored document are scored for the at least one cluster relative to at least one other cluster. At least one such scored search query is suggested as a set of query refinements.
20 Citations
7 Claims
-
1. A method performed by a data processing apparatus, the method comprising:
-
accessing, by the data processing apparatus, a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query; receiving, by the data processing apparatus, a search query from a user device; obtaining, by the data processing apparatus and in response to receiving the search query, search results, each search result referencing a resource and having a relevance score indicating a measure of relevance of the search result to the search query; identifying, by the data processing apparatus, a subset of the resources referenced by the search results, each resource in the subset matching a resource that is identified by one or more of the query-resource pairings; for each resource in the subset; generating, by the data processing apparatus, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each of the query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension; multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and normalizing the term vector; generating, by the data processing apparatus, one or more clusters of resources based on the term vectors of each resource in the subset; generating, by the data processing apparatus and for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; selecting, by the data processing apparatus, as refinement clusters, n clusters with the highest rank score, where n is a positive integer; generating, by the data processing apparatus, a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; generating, by the data processing apparatus and for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster; for each refinement cluster, selecting, by the data processing apparatus, as a representative query for the refinement cluster, a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster; selecting, by the data processing apparatus, as a set of candidate queries, the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; and providing, by the data processing apparatus, one or more of the candidate queries to the user device.
-
-
2. A method performed by a data processing apparatus, the method comprising:
-
accessing, by the data processing apparatus, a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query; obtaining, by the data processing apparatus and in response to receiving a search query, search results, each search result referencing a respective resource and having a respective relevance score; identifying, by the data processing apparatus, a subset of the resources referenced by the search results that match at least one resource that is identified by one or more of the query-resource pairings; selecting, by the data processing apparatus, a set of the queries identified by query-resource pairings that identify a resource included in the subset based at least on the relevance score of the respective resource and the weight for each respective query-resource pairing providing, by the data processing apparatus, one or more of the queries of the set to the user device; generating, for each resource in the subset, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension; generating one or more clusters of resources based on the term vectors of each resource in the subset; for at least one cluster, selecting, as a representative query for the cluster, a query identified by stored query-resource pairings that correspond to the resources of the cluster based on the term vectors for each resource included in the cluster, wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset comprises selecting the set of queries from the representative queries; generating, for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; and selecting, as refinement clusters, n clusters with the highest rank score, wherein n is a positive integer, and wherein each representative query is selected from one of the refinement clusters, wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset further comprises selecting one or more of the set of queries from the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; generating a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; and generating, for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster, wherein each representative query selected for each refinement cluster is a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster. - View Dependent Claims (3)
-
-
4. A system comprising:
-
a data processing apparatus; and a data storage device storing instructions that, when executed, cause the data processing apparatus to perform operations comprising; accessing a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query; obtaining, in response to receiving a search query, search results, each search result referencing a respective resource and having a respective relevance score; identifying a subset of the resources referenced by the search results that match at least one resource that is identified by one or more of the query-resource pairings; selecting a set of the queries identified by query-resource pairings that identify a resource included in the subset based at least on the relevance score of the respective resource and the weight for each respective query-resource pairing; providing one or more of the queries of the set to the user device; generating, for each resource in the subset, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension; generating one or more clusters of resources based on the term vectors of each resource in the subset; for at least one cluster, selecting, as a representative query for the cluster, a query identified by stored query-resource pairings that correspond to the resources of the cluster based on the term vectors for each resource included in the cluster, wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset comprises selecting the set of queries from the representative queries; generating, for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; and selecting, as refinement clusters, n clusters with the highest rank score, wherein n is a positive integer, and wherein each representative query is selected from one of the refinement clusters, wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset further comprises selecting one or more of the set of queries from the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; generating a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; and generating, for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster, wherein each representative query selected for each refinement cluster is a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster. - View Dependent Claims (5)
-
-
6. A computer-readable device comprising instructions that, upon execution, cause a data processing apparatus to perform operations comprising:
-
accessing a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query; obtaining, in response to receiving a search query, search results, each search result referencing a respective resource and having a respective relevance score; identifying a subset of the resources referenced by the search results that match at least one resource that is identified by one or more of the query-resource pairings; selecting a set of the queries identified by query-resource pairings that identify a resource included in the subset based at least on the relevance score of the respective resource and the weight for each respective query-resource pairing; providing one or more of the queries of the set to the user device; generating, for each resource in the subset, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension; generating one or more clusters of resources based on the term vectors of each resource in the subset; for at least one cluster, selecting, as a representative query for the cluster, a query identified by stored query-resource pairings that correspond to the resources of the cluster based on the term vectors for each resource included in the cluster, wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset comprises selecting the set of queries from the representative queries; generating, for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; and selecting, as refinement clusters, n clusters with the highest rank score, wherein n is a positive integer, and wherein each representative query is selected from one of the refinement clusters, wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset further comprises selecting one or more of the set of queries from the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; generating a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; and generating, for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster, wherein each representative query selected for each refinement cluster is a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster. - View Dependent Claims (7)
-
Specification