System and method for providing search query refinements

US 8,645,407 B2
Filed: 11/04/2011
Issued: 02/04/2014
Est. Priority Date: 09/05/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A method performed by a data processing apparatus, the method comprising:

accessing, by the data processing apparatus, a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query;

receiving, by the data processing apparatus, a search query from a user device;

obtaining, by the data processing apparatus and in response to receiving the search query, search results, each search result referencing a resource and having a relevance score indicating a measure of relevance of the search result to the search query;

identifying, by the data processing apparatus, a subset of the resources referenced by the search results, each resource in the subset matching a resource that is identified by one or more of the query-resource pairings;

for each resource in the subset;

generating, by the data processing apparatus, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each of the query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension;

multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and

normalizing the term vector;

generating, by the data processing apparatus, one or more clusters of resources based on the term vectors of each resource in the subset;

generating, by the data processing apparatus and for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster;

selecting, by the data processing apparatus, as refinement clusters, n clusters with the highest rank score, where n is a positive integer;

generating, by the data processing apparatus, a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster;

generating, by the data processing apparatus and for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster;

for each refinement cluster, selecting, by the data processing apparatus, as a representative query for the refinement cluster, a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster;

selecting, by the data processing apparatus, as a set of candidate queries, the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; and

providing, by the data processing apparatus, one or more of the candidate queries to the user device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for providing search query refinements are presented. A stored query and a stored document are associated as a logical pairing. A weight is assigned to the logical pairing. The search query is issued and a set of search documents is produced. At least one search document is matched to at least one stored document. The stored query and the assigned weight associated with the matching at least one stored document are retrieved. At least one cluster is formed based on the stored query and the assigned weight associated with the matching at least one stored document. The stored query associated with the matching at least one stored document are scored for the at least one cluster relative to at least one other cluster. At least one such scored search query is suggested as a set of query refinements.

20 Citations

7 Claims

1. A method performed by a data processing apparatus, the method comprising:
- accessing, by the data processing apparatus, a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query;
  
  receiving, by the data processing apparatus, a search query from a user device;
  
  obtaining, by the data processing apparatus and in response to receiving the search query, search results, each search result referencing a resource and having a relevance score indicating a measure of relevance of the search result to the search query;
  
  identifying, by the data processing apparatus, a subset of the resources referenced by the search results, each resource in the subset matching a resource that is identified by one or more of the query-resource pairings;
  
  for each resource in the subset;
  
  generating, by the data processing apparatus, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each of the query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension;
  
  multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and
  
  normalizing the term vector;
  
  generating, by the data processing apparatus, one or more clusters of resources based on the term vectors of each resource in the subset;
  
  generating, by the data processing apparatus and for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster;
  
  selecting, by the data processing apparatus, as refinement clusters, n clusters with the highest rank score, where n is a positive integer;
  
  generating, by the data processing apparatus, a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster;
  
  generating, by the data processing apparatus and for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster;
  
  for each refinement cluster, selecting, by the data processing apparatus, as a representative query for the refinement cluster, a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster;
  
  selecting, by the data processing apparatus, as a set of candidate queries, the representative queries of m refinement clusters with the highest rank score, where m is a positive integer; and
  
  providing, by the data processing apparatus, one or more of the candidate queries to the user device.

2. A method performed by a data processing apparatus, the method comprising:
- accessing, by the data processing apparatus, a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query;
  
  obtaining, by the data processing apparatus and in response to receiving a search query, search results, each search result referencing a respective resource and having a respective relevance score;
  
  identifying, by the data processing apparatus, a subset of the resources referenced by the search results that match at least one resource that is identified by one or more of the query-resource pairings;
  
  selecting, by the data processing apparatus, a set of the queries identified by query-resource pairings that identify a resource included in the subset based at least on the relevance score of the respective resource and the weight for each respective query-resource pairingproviding, by the data processing apparatus, one or more of the queries of the set to the user device;
  
  generating, for each resource in the subset, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension;
  
  generating one or more clusters of resources based on the term vectors of each resource in the subset;
  
  for at least one cluster, selecting, as a representative query for the cluster, a query identified by stored query-resource pairings that correspond to the resources of the cluster based on the term vectors for each resource included in the cluster,wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset comprises selecting the set of queries from the representative queries;
  
  generating, for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; and
  
  selecting, as refinement clusters, n clusters with the highest rank score, wherein n is a positive integer, and wherein each representative query is selected from one of the refinement clusters,wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset further comprises selecting one or more of the set of queries from the representative queries of m refinement clusters with the highest rank score, where m is a positive integer;
  
  generating a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; and
  
  generating, for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster,wherein each representative query selected for each refinement cluster is a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster.
- View Dependent Claims (3)
- - 3. The method of claim 2, further comprising:
    - for each resource in the subset;
      
      multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and
      
      normalizing the term vector.

4. A system comprising:
- a data processing apparatus; and
  
  a data storage device storing instructions that, when executed, cause the data processing apparatus to perform operations comprising;
  
  accessing a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query;
  
  obtaining, in response to receiving a search query, search results, each search result referencing a respective resource and having a respective relevance score;
  
  identifying a subset of the resources referenced by the search results that match at least one resource that is identified by one or more of the query-resource pairings;
  
  selecting a set of the queries identified by query-resource pairings that identify a resource included in the subset based at least on the relevance score of the respective resource and the weight for each respective query-resource pairing;
  
  providing one or more of the queries of the set to the user device;
  
  generating, for each resource in the subset, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension;
  
  generating one or more clusters of resources based on the term vectors of each resource in the subset;
  
  for at least one cluster, selecting, as a representative query for the cluster, a query identified by stored query-resource pairings that correspond to the resources of the cluster based on the term vectors for each resource included in the cluster,wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset comprises selecting the set of queries from the representative queries;
  
  generating, for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; and
  
  selecting, as refinement clusters, n clusters with the highest rank score, wherein n is a positive integer, and wherein each representative query is selected from one of the refinement clusters,wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset further comprises selecting one or more of the set of queries from the representative queries of m refinement clusters with the highest rank score, where m is a positive integer;
  
  generating a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; and
  
  generating, for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster,wherein each representative query selected for each refinement cluster is a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster.
- View Dependent Claims (5)
- - 5. The system of claim 4, wherein the operations further comprise:
    - for each resource in the subset;
      
      multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and
      
      normalizing the term vector.

6. A computer-readable device comprising instructions that, upon execution, cause a data processing apparatus to perform operations comprising:
- accessing a database that stores query-resource pairings and a respective weight for each query-resource pairing, each query-resource pairing identifying a single query and a single resource associated with the query;
  
  obtaining, in response to receiving a search query, search results, each search result referencing a respective resource and having a respective relevance score;
  
  identifying a subset of the resources referenced by the search results that match at least one resource that is identified by one or more of the query-resource pairings;
  
  selecting a set of the queries identified by query-resource pairings that identify a resource included in the subset based at least on the relevance score of the respective resource and the weight for each respective query-resource pairing;
  
  providing one or more of the queries of the set to the user device;
  
  generating, for each resource in the subset, a term vector from queries identified by stored query-resource pairings that correspond to the resource, wherein each dimension of the term vector corresponds to a distinct term of the queries identified by the stored query-resource pairings that correspond to the resource, and each dimension of the term vector has a value that is based on the respective weights of each query-resource pairing that identifies (i) the resource and (ii) a query that includes the distinct term that corresponds to the dimension;
  
  generating one or more clusters of resources based on the term vectors of each resource in the subset;
  
  for at least one cluster, selecting, as a representative query for the cluster, a query identified by stored query-resource pairings that correspond to the resources of the cluster based on the term vectors for each resource included in the cluster,wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset comprises selecting the set of queries from the representative queries;
  
  generating, for each cluster, a rank score based on (i) the relevance scores of the search results that reference resources that match the resources of the cluster, and (ii) a quantity of the resources of the cluster; and
  
  selecting, as refinement clusters, n clusters with the highest rank score, wherein n is a positive integer, and wherein each representative query is selected from one of the refinement clusters,wherein selecting a set of the queries identified by query resource pairings that identify a resource included in the subset further comprises selecting one or more of the set of queries from the representative queries of m refinement clusters with the highest rank score, where m is a positive integer;
  
  generating a respective centroid for each refinement cluster, each centroid representing a weighted center of a term vector that corresponds to the refinement cluster; and
  
  generating, for each refinement cluster, a cluster score for each unique query identified by stored query-resource pairings that correspond to the resources of the refinement cluster based upon (i) a quantity of the query-resource pairings that identify the unique query, and (ii) a distance from the unique query to the centroid of the potential refinement cluster,wherein each representative query selected for each refinement cluster is a unique query that has a highest cluster score of the cluster scores for the unique queries identified by stored query-resource pairings that correspond to the resources of the refinement cluster.
- View Dependent Claims (7)
- - 7. The computer-readable device of claim 6, wherein the operations further comprise:
    - for each resource in the subset;
      
      multiplying, by a constant factor, the value of each dimension of the term vector that corresponds to a distinct term that matches a term included in the received search query; and
      
      normalizing the term vector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Inc. (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Haahr, Paul, Baker, Steven
Primary Examiner(s)
PYO, MONICA M

Application Number

US13/289,348
Publication Number

US 20120054216A1
Time in Patent Office

823 Days
Field of Search

707/765, 707/766, 707999003-999005
US Class Current

707/765
CPC Class Codes

G06F 16/242   Query formulation

G06F 16/285   Clustering or classification

G06F 16/3326   using relevance feedback fr...

G06F 16/355   Class or cluster creation o...

G06F 16/951   Indexing; Web crawling tech...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99945   Object-oriented database st...

System and method for providing search query refinements

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

20 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for providing search query refinements

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links