Substantially similar queries

US 8,156,129 B2
Filed: 01/15/2009
Issued: 04/10/2012
Est. Priority Date: 01/15/2009
Status: Active Grant

First Claim

Patent Images

1. A method comprising the following computer-executable acts:

analyzing a relationship between a first query and a second query based at least in part upon search results previously selected by users, wherein the search results previously selected by the users were presented to the users in response to submission of the first query and/or the second query to a search engine, wherein analyzing the relationship between the first query and the second query comprises;

accessing a data repository that comprises a computer-implemented bipartite graph that includes a first set of nodes and a second set of nodes, wherein the first set of nodes represents queries and the second set of nodes represents URLs, wherein the first set of nodes includes a first node that is representative of the first query and a second node that is representative of the second query, and wherein the graph further comprises edges that are weighted to indicate relationships between queries and URLs;

initiating a random walk at the first node; and

determining a number of steps in the random walk until the second node is reached in the random walk, wherein a step is from a node in the first set of nodes to another node in the first set of nodes;

determining whether the first query is substantially similar to the second query based at least in part upon the number of steps in the random walk until the second node is reached in the random walk; and

generating correlation data that correlates the first query and the second query if the first query and second query are determined to be substantially similar.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system described herein includes analyzer component that analyzes queries submitted by users and corresponding URLs selected by the users, wherein the queries include a first query and a second query, and wherein the analyzer component determines that the first query and the second query are substantially similar queries. The system additionally includes a correlator component that, responsive to the analyzer component determining that the first query and the second query are substantially similar, generates correlation data that indicates that the first and second queries are substantially similar.

56 Citations

View as Search Results

20 Claims

1. A method comprising the following computer-executable acts:
- analyzing a relationship between a first query and a second query based at least in part upon search results previously selected by users, wherein the search results previously selected by the users were presented to the users in response to submission of the first query and/or the second query to a search engine, wherein analyzing the relationship between the first query and the second query comprises;
  
  accessing a data repository that comprises a computer-implemented bipartite graph that includes a first set of nodes and a second set of nodes, wherein the first set of nodes represents queries and the second set of nodes represents URLs, wherein the first set of nodes includes a first node that is representative of the first query and a second node that is representative of the second query, and wherein the graph further comprises edges that are weighted to indicate relationships between queries and URLs;
  
  initiating a random walk at the first node; and
  
  determining a number of steps in the random walk until the second node is reached in the random walk, wherein a step is from a node in the first set of nodes to another node in the first set of nodes;
  
  determining whether the first query is substantially similar to the second query based at least in part upon the number of steps in the random walk until the second node is reached in the random walk; and
  
  generating correlation data that correlates the first query and the second query if the first query and second query are determined to be substantially similar.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein determining whether the first query is substantially similar to the second query comprises:
    - computing a similarity score between the first query and the second query based at least in part upon the number of steps in the random walk until the second node is reached in the random walk;
      
      comparing the similarity score with a threshold score; and
      
      indicating that the first query and the second query are substantially similar based at least in part upon the comparing of the similarity score with the threshold score.
  - 3. The method of claim 1, wherein analyzing the relationship between the first query and second query further comprises:
    - subsequent to determining the number of steps in the random walk until the second node is reached in the random walk, continuing the random walk until the first node is reached; and
      
      determining a number of steps in the random walk from the first node and back to the first node.
  - 4. The method of claim 1, wherein the edges are weighted based at least in part upon a number of user selections of a URL when a particular query is submitted to a search engine.
  - 5. The method of claim 4, wherein an edge selected during the random walk is based at least in part upon a weight of the edge in the bipartite graph.
  - 6. The method of claim 1, wherein analyzing the relationship between the first query and the second query further comprises:
    - initiating a second random walk at the second node; and
      
      determining a number of steps in the first random walk and the second random walk until the first random walk and the second random walk intersect in the bipartite graph.
  - 7. The method of claim 1, further comprising:
    - receiving the first query from a user for execution against contents of a data repository;
      
      determining that the first query is substantially similar to the second query; and
      
      outputting a search result to the user that is based at least in part upon the second query.
  - 8. The method of claim 7, further comprising:
    - outputting search results to the user based at least in part upon the first query and the second query.
  - 9. The method of claim 7, further comprising:
    - automatically replacing a first term in the first query with a second term in the second query to create a modified first query; and
      
      outputting search results based at least in part upon the modified first query.
  - 10. The method of claim 7, further comprising:
    - outputting an advertisement based at least in part upon the second query.
  - 11. The method of claim 1, further comprising:
    - accessing a database that includes query logs; and
      
      constructing the bipartite graph that includes queries and URLs selected by users corresponding to the queries.
  - 12. The method of claim 1, wherein the first query is initially selected and determinations are made regarding whether the first query is substantially similar to a plurality of other queries.

13. A system, comprising:
- a processor;
  
  a data repository that includes a computer-implemented bipartite graph, wherein the bipartite graph includes a first set of nodes that represent queries, a second set of nodes that represent URLs, and weighted edges that represent relationships between queries and URLs, wherein the edges are weighted based at least in part upon a number of user selections of URLs given particular queries, wherein the first set of nodes includes a first node that represents a first query and a second node that represents a second query;
  
  a memory that comprises a plurality of components that are executed by the processor, the plurality of components comprising;
  
  an analyzer component that analyzes queries submitted by users and corresponding URLs selected by the users, wherein the queries include the first query and the second query, and wherein the analyzer component initiates a random walk at the first node in the bi-partite graph and counts a number of steps taken during the random walk until the random walk reaches the second node in the bipartite graph, the analyzer component determining that the first query and the second query are substantially similar based at least in part upon the number of steps taken during the random walk; and
  
  a correlator component that, responsive to the analyzer component determining that the first query and the second query are substantially similar, generates correlation data that indicates that the first and second queries are substantially similar.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, wherein the analyzer component computes one of hitting time between the first node and the second node, commute time with respect to the first node and the second node, or meeting time between the first node and the second node.
  - 15. The system of claim 13, the plurality of components further comprising a constructor component that access query logs and constructs the bipartite graph based at least in part upon the query logs.
  - 16. The system of claim 13, wherein the correlator component causes the first query and the second query to be correlated in a database of substantially similar queries.
  - 17. The system of claim 16, the plurality of components further comprising:
    - a search component that receives the first query from a user, accesses the database of substantially similar queries and determines that the second query is a substantially similar query to the first query, and automatically executes a search using the second query.

18. A computer-readable data storage device comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
- receiving a first query;
  
  accessing a computer-implemented bipartite graph, wherein the bipartite graph includes a first set of nodes that represent queries and a second set of nodes that represent URLs, wherein the first set of nodes includes a first node that is representative of the first query and wherein the bipartite graph further includes a first weighted edge that couples the first node to at least one node in the second set of nodes and includes weighted edges that couple nodes in the first set of nodes with nodes in the second set of nodes;
  
  initiating a random walk from the first node, wherein edges in the random walk are selected pseudo-randomly while considering weights of the edges; and
  
  outputting a second query that is determined to be substantially similar to the first query based at least in part upon a number of steps taken during the random walk between the first node and a second node that represents the second query.
- View Dependent Claims (19, 20)
- - 19. The computer-readable data storage device of claim 18, wherein the acts further comprise:
    - continuing the random walk from the second node until the random walk returns to the first node;
      
      counting a total number of steps taken during the random walk from the first node to the second node and back to the first node; and
      
      outputting the second query based at least in part upon the total number of steps taken during the random walk from the first node to the second node and back to the first node.
  - 20. The computer-readable data storage device of claim 18, wherein the acts further comprise:
    - subsequent to outputting the second query, replacing a first term in the first query with a second term in the second query to generate a modified query; and
      
      automatically executing a web search utilizing the modified query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhou, Dengyong, Burges, Christopher J. C., Rounthwaite, Robert L.
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
BUI, THUY T

Application Number

US12/353,976
Publication Number

US 20100185649A1
Time in Patent Office

1,181 Days
Field of Search

707/721, 707/723, 707/727, 707/748, 707/749
US Class Current

707/749
CPC Class Codes

G06F 16/9024 Graphs; Linked lists G06F16...

G06F 16/951 Indexing; Web crawling tech...

Substantially similar queries

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

56 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Substantially similar queries

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links