IDENTIFYING RELEVANT INFORMATION SOURCES FROM USER ACTIVITY

US 20090248661A1
Filed: 03/28/2008
Published: 10/01/2009
Est. Priority Date: 03/28/2008
Status: Abandoned Application

First Claim

Patent Images

1. A computer-implemented process for finding relevant sources of information for a search query, comprising:

constructing a weighted model that associates every term in multiple search queries with relevant sources from multiple users'"'"' searching and browsing activity;

inputting a new query that is represented as a set of terms;

determining relevant sources for all terms in the new query using the weighted model to determine an overall prediction of the most relevant sources for the query; and

displaying the determined relevant sources for the new query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A relevant information source identification technique that exploits a combination of searching and browsing activity of many users to identify relevant resources for future queries. The technique relies on such data to identify relevant information sources for new queries. In one embodiment, the technique is term-based: past queries are decomposed into individual (possibly overlapping) terms and phrases, and the most relevant documents are identified for each phrase from the browsing patterns of users that follow the query. Then, for a new query that consists of several terms or phrases, the most relevant destinations for each term/phrase are combined to produce overall predictions of the best or most relevant sources for the new query. This allows for providing predictions for previously unseen queries, which comprise a large proportion of the overall query volume.

Citations

20 Claims

1. A computer-implemented process for finding relevant sources of information for a search query, comprising:
- constructing a weighted model that associates every term in multiple search queries with relevant sources from multiple users'"'"' searching and browsing activity;
  
  inputting a new query that is represented as a set of terms;
  
  determining relevant sources for all terms in the new query using the weighted model to determine an overall prediction of the most relevant sources for the query; and
  
  displaying the determined relevant sources for the new query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented process of claim 1 wherein creating the weighted model further comprises computing weights to quantify the degree of relevance of each of the sources to each term of the multiple queries.
  - 3. The computer-implemented process of claim 1 wherein a source document is a web site, a web page, a document, or an image.
  - 4. The computer-implemented process of claim 3 further comprising assigning a higher weight to more rare terms that are more likely to differentiate between relevant and non-relevant sources.
  - 5. The computer-implemented process of claim 2 wherein the weights to quantify the degree of relevance of each of the sources are computed by using the number of user visits to a source for a given term.
  - 6. The computer-implemented process of claim 2 wherein the weights to quantify the degree of relevance of each of the sources are computed by using the dwell time of user visits to a source for a given term.
  - 7. The computer-implemented process of claim 1 further comprising displaying the most relevant sources in order of determined relevance.
  - 8. The computer-implemented process of claim 1 further comprising creating the weighted model using a heuristic method.
  - 9. The computer-implemented process of claim 1 further comprising creating the weighted model using a probabilistic model where every term is associated with a probability distribution over sources that corresponds to the likelihood of a source being relevant following a query that contains a given term.
  - 10. The computer-implemented process of claim 1 further comprising creating the weighted model that is a random walk probabilistic model that gives higher scores to sources that are relevant to more than one term in a query by giving these sources higher weights.

11. A computer-implemented process for finding relevant sources of information for a search query on a network, comprising:
- inputting a set of queries and associated search trails from several users;
  
  creating a weighted model that associates every term or phrase in each search query with relevant sources from the several users'"'"' search trails;
  
  inputting a new query comprising a set of terms;
  
  determining probability of relevant sources for each search trail for each term in the new query using the weighted model; and
  
  determining the overall relevance of each source document for the entire new query by combining the probability of relevant sources for each term.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer-implemented process of claim 11 further comprising displaying the sources for the new query, ranked in order of their overall relevance.
  - 13. The computer-implemented process of claim 11 wherein each search trail further comprises pages that are search results and pages connected to a search result page via a sequence of hyperlinks.
  - 14. The computer-implemented process of claim 13 wherein the overall relevance of one or more sources is used as one or more features within a learnable ranking system that includes multiple features based on different sources of evidence.
  - 15. The computer-implemented process of claim 11 further comprising using a combination of the number of user visits or user dwell time on one or more sources to compute the contribution of an individual search trail to the weight of a term.

16. A system for finding relevant sources of information on a network in response to a search query, comprising:
- a general purpose computing device;
  
  a computer program comprising program modules executable by the general purpose computing device, wherein the computing device is directed by the program modules of the computer program to,receive a set of users'"'"' search queries and associated search result histories;
  
  create search trails that each include a query, a sequence of URLs accessed by a user including the time spent on each URL and tokenizations of the search query terms;
  
  create a weighted model that associates every term in a query with one or more relevant sources based on users'"'"' searching and browsing history;
  
  input a new search query, broken into terms;
  
  use the weighted model to rank the relevance of sources by predicting the most relevant sources for each of the terms of the new query;
  
  output the most relevant sources for the new search query.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16 further comprising tokenizations of query terms that are overlapping.
  - 18. The system of claim 16 wherein the weight of a term for a source is the sum of the weight contributions from all search trails that start with a query and include the source in the search trail.
  - 19. The system of claim 16 wherein the number of visits to a source and the dwell time on a source are used to compute the contribution of an individual search trail to the weight of a term in a query.
  - 20. The system of claim 16 wherein creating the weighted module further comprises assigning non-zero term weights to all sources that occur in search trails that follow a query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Bilenko, Mikhail, White, Ryen W.

Application Number

US12/057,491
Publication Number

US 20090248661A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

IDENTIFYING RELEVANT INFORMATION SOURCES FROM USER ACTIVITY

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

IDENTIFYING RELEVANT INFORMATION SOURCES FROM USER ACTIVITY

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links