Ranking documents based on user behavior and/or feature data

US 7,716,225 B1
Filed: 06/17/2004
Issued: 05/11/2010
Est. Priority Date: 06/17/2004
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more server devices, comprising:

storing, in a memory associated with the one or more server devices, feature data associated with a plurality of first links, within a plurality of first source documents, that point to a plurality of first target documents,the feature data, for one of the plurality of first links, including one or more features of one of the plurality of first source documents that contains the one of the plurality of links, one or more features of one of the plurality of first target documents that is pointed to by the one of the plurality of links, and one or more features of the one of the plurality of first links;

storing, in a memory associated with the one or more server devices, user behavior data relating to user navigational activity with regard to the plurality of first source documents accessed by one or more users and the plurality of first links within the plurality of first source documents selected by the one or more users;

training, using one or more processors of the one or more server devices and based on the feature data and the user behavior data, a model that identifies a probability that a particular link, with particular feature data, will be selected by a user, where training the model includes;

analyzing the feature data associated with each of the plurality of first links that was selected by the one or more users and the feature data associated with each of the plurality of first links that was not selected by the one or more users to generate rules for the model;

identifying, by one or more processors associated with the one or more server devices, a plurality of second links, within a plurality of second source documents, that point to a plurality of second target documents;

determining, using one or more processors associated with the one or more server devices, feature data associated with each of the plurality of second links,the feature data, associated with one of the plurality of second links, including one or more features of the one of the plurality of second links, one or more features of one of the plurality of second source documents that contains the one of the plurality of second links, and one or more features of the one of the plurality of second target documents that is pointed to by the one of the plurality of second links;

determining, using the model and based on the feature data, a probability that each of the plurality of second links will be selected by a user, where the determining includes;

inputting, into the model, the feature data associated with the one of the plurality of second links, andoutputting, by the model, the probability that the one of the plurality of second links will be selected by a user;

calculating, using one or more processors associated with the one or more server devices, a rank for a particular target document of the plurality of second target documents based on the probability associated with one or more of the plurality of second links that point to the particular target document; and

ordering the particular target document, with regard to at least one other document, based on the rank for the particular target document.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system generates a model based on feature data relating to different features of a link from a linking document to a linked document and user behavior data relating to navigational actions associated with the link. The system also assigns a rank to a document based on the model.

Citations

19 Claims

1. A method performed by one or more server devices, comprising:
- storing, in a memory associated with the one or more server devices, feature data associated with a plurality of first links, within a plurality of first source documents, that point to a plurality of first target documents,the feature data, for one of the plurality of first links, including one or more features of one of the plurality of first source documents that contains the one of the plurality of links, one or more features of one of the plurality of first target documents that is pointed to by the one of the plurality of links, and one or more features of the one of the plurality of first links;
  
  storing, in a memory associated with the one or more server devices, user behavior data relating to user navigational activity with regard to the plurality of first source documents accessed by one or more users and the plurality of first links within the plurality of first source documents selected by the one or more users;
  
  training, using one or more processors of the one or more server devices and based on the feature data and the user behavior data, a model that identifies a probability that a particular link, with particular feature data, will be selected by a user, where training the model includes;
  
  analyzing the feature data associated with each of the plurality of first links that was selected by the one or more users and the feature data associated with each of the plurality of first links that was not selected by the one or more users to generate rules for the model;
  
  identifying, by one or more processors associated with the one or more server devices, a plurality of second links, within a plurality of second source documents, that point to a plurality of second target documents;
  
  determining, using one or more processors associated with the one or more server devices, feature data associated with each of the plurality of second links,the feature data, associated with one of the plurality of second links, including one or more features of the one of the plurality of second links, one or more features of one of the plurality of second source documents that contains the one of the plurality of second links, and one or more features of the one of the plurality of second target documents that is pointed to by the one of the plurality of second links;
  
  determining, using the model and based on the feature data, a probability that each of the plurality of second links will be selected by a user, where the determining includes;
  
  inputting, into the model, the feature data associated with the one of the plurality of second links, andoutputting, by the model, the probability that the one of the plurality of second links will be selected by a user;
  
  calculating, using one or more processors associated with the one or more server devices, a rank for a particular target document of the plurality of second target documents based on the probability associated with one or more of the plurality of second links that point to the particular target document; and
  
  ordering the particular target document, with regard to at least one other document, based on the rank for the particular target document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 19)
- - 2. The method of claim 1, further comprising:
    - obtaining data relating to the user navigational activity of the one or more users from client devices used by the one or more users.
  - 3. The method of claim 1, where the user behavior data corresponds to a single user.
  - 4. The method of claim 1, where the user behavior data corresponds to a class of users.
  - 5. The method of claim 1, where the features associated with one of the plurality of first source documents include at least one of an entire address of the one of the plurality of first source documents, a portion of the address of the one of the plurality of first source documents, information regarding a web site associated with the one of the plurality of first source documents, a number of links in the one of the plurality of first source documents, presence of words in the one of the plurality of first source documents, presence of words in a heading of the one of the plurality of first source documents, a topical cluster with which the one of the plurality of first source documents is associated, or a degree to which a topical cluster associated with the one of the plurality of first source documents matches a topical cluster associated with a link.
  - 6. The method of claim 1, where the features associated with one of the plurality of first links include at least one of a font size of anchor text associated with the one of the plurality of first links, a position of the one of the plurality of first links within one of the plurality of first source documents, a position of the one of the plurality of first links in a list, a font color associated with the one of the plurality of first links, attributes of the one of the plurality of first links, a number of words in the anchor text associated with the one of the plurality of first links, actual words in the anchor text associated with the one of the plurality of first links, a determination of commerciality of the anchor text associated with the one of the plurality of first links, a type of the one of the plurality of first links, a context of words before or after the one of the plurality of first links, a topical cluster with which the anchor text of the one of the plurality of first links is associated, whether the one of the plurality of first links leads to a first target document on a same host or domain as one of the plurality of first source documents containing the one of the plurality of first links, or whether an address associated with the one of the plurality of first links embeds another address.
  - 7. The method of claim 1, where the features associated with one of the plurality of first target documents include at least one of an entire address of the one of the plurality of first target documents, a portion of the address of the one of the plurality of first target documents, information regarding a web site associated with the one of the plurality of first target documents, whether the address of the one of the plurality of first target documents is on a same host as an address of a first source document that links to the one of the plurality of first target documents, whether the address of the one of the plurality of first target documents is associated with a same domain as the address of the first source document, words in the address of the one of the plurality of first target documents, or a length of the address of the one of the plurality of first target documents.
  - 8. The method of claim 1, further comprising:
    - generating a feature vector for each one of the plurality of first links based on the feature data associated with the one of the plurality of first links.
  - 9. The method of claim 8, where analyzing the feature data associated with the plurality of first links and the instances where each of the plurality of the first links were selected by the one or more users and the instances where each of the plurality of first links were not selected by the one or more users includes:
    - generating the rules for the model based on the instances where each of the plurality of the first links were selected by the one or more users and the instances where each of the plurality of first links were not selected by the one or more users and the feature vectors.
  - 10. The method of claim 1, where the rules for the model comprise:
    - a general rule applicable to a group of documents, anda specific rule applicable to a particular document.
  - 11. The method of claim 1, further comprising:
    - periodically updating the rules for the model based on changes in the user behavior data.
  - 19. The one or more server devices of claim 6, where the data associated with the features of the one of the target documents includes at least two of:
    - the entire address of the target document, the portion of the address of the target document, the information regarding a web site associated with the target document, whether the address of the target document is on a same host as an address of a source document that links to the target document, whether the address of the target document is associated with a same domain as the address of the source document, the words in the address of the target document, or the length of the address of the target document.

12. A method performed by one or more server devices, comprising:
- storing, in one or more memories associated with the one or more server devices, feature data associated with a plurality of first links within a plurality of first source documents that point to a plurality of first target documents, the feature data including features of the first source documents, features of the first target documents, and features of the first links;
  
  storing, in one or more memories associated with the one or more server devices, user behavior data relating to user navigational activity with regard to the first links within the first source documents selected by one or more users;
  
  training, using one or more processors associated with the one or more server devices and based on the feature data associated with the feature data associated with the first links and the user behavior data relating to the first links, a model that identifies a probability that a particular link will be selected by a user, where training the model includes;
  
  analyzing the feature data associated with the first links that were selected by the one or more users and the feature data associated with the first links that were not selected by the one or more users to generate rules for the model;
  
  identifying a plurality of second links within a plurality of second source documents that point to a plurality of second target documents;
  
  determining feature data associated with the second links, the feature data associated with the second links including features of the second source documents, features of the second target documents, and features of the second links;
  
  determining, using the model, a probability that each of the second links will be selected using only the feature data associated with the second link as input to the model;
  
  assigning a weight to each of the second links based on the probability that the second link will be selected;
  
  assigning a rank to one of the second target documents based on the weights assigned to the second links that point to the one of the second target documents; and
  
  ordering the one of the second target documents, with regard to at least one other document, based on the rank assigned to the one of the second target documents.
- View Dependent Claims (13, 14, 15)
- - 13. The method of claim 12, further comprising:
    - periodically updating the rules for the model based on changes to the user behavior data.
  - 14. The method of claim 12, where the user behavior datacorresponds to a single user.
  - 15. The method of claim 12, where the user behavior data corresponds to a plurality of users.

16. One or more server devices, comprising:
- means for storing, in a memory, feature data associated with a plurality of links within source documents that point to target documents, the feature data including data associated with features of the source documents, data associated with features of the links, and data associated with features of the target documents,the data associated with the features of one of the source documents including at least one of an entire address of the source document, a portion of the address of the source document, information regarding a web site associated with the source document, a number of links in the source document, presence of words in the source document, presence of words in a heading of the source document, a topical cluster with which the source document is associated, or a degree to which a topical cluster associated with the source document matches a topical cluster associated with a link,the data associated with the features of one of the links including at least one of a font size of anchor text associated with the link, a position of the link within a source document, a position of the link in a list, a font color associated with the link, attributes of the link, a number of words in the anchor text associated with the link, actual words in the anchor text associated with the link, a determination of commerciality of the anchor text associated with the link, a type of the link, a context of words before or after the link, a topical cluster with which the anchor text of the link is associated, whether the link leads to a target document on a same host or domain, or whether an address associated with the link embeds another address, andthe data associated with the features of one of the target documents including at least one of an entire address of the target document, a portion of the address of the target document, information regarding a web site associated with the target document, whether the address of the target document is on a same host as an address of a source document that links to the target document, whether the address of the target document is associated with a same domain as the address of the source document, words in the address of the target document, or a length of the address of the target document;
  
  means for storing, in a memory, user behavior data relating to user navigational activity with regard to the source documents accessed by one or more users and the links within the source documents selected by the one or more users and the links within the source documents that were not selected by the one or more users;
  
  means for training, based on the feature data and instances where the links were selected by the one or more users and instances where the links were not selected by the one or more users, a model that identifies a probability that a link, with particular feature data, will be selected by a user, where the means for training includes;
  
  means for analyzing the feature data associated with the links that were selected by the one or more users and the feature data associated with the links that were not selected by the one or more users to generate rules for the model;
  
  means for identifying a particular link within a first document that points to a second document;
  
  means for determining the feature data associated with the particular link;
  
  means for determining, based on inputting the feature data into the model, a probability that the particular link will be selected by a user;
  
  means for assigning a weight to the particular link based on the probability that the particular link will be selected;
  
  means for assigning a rank to the second document based on the weight assigned to the particular link; and
  
  means for ordering the second document, with respect to at least one other document, based on the assigned rank.
- View Dependent Claims (17, 18)
- - 17. The one or more server devices of claim 16, where the data associated with the features of the one of the source documents includes at least two of:
    - the entire address of the source document, the portion of the address of the source document, the information regarding a web site associated with the source document, the number of links in the source document, the presence of words in the source document, the presence of words in a heading of the source document, the topical cluster with which the source document is associated, or the degree to which a topical cluster associated with the source document matches a topical cluster associated with a link.
  - 18. The one or more server devices of claim 16, where the data associated with the features of the one of the links includes at least two of:
    - the font size of anchor text associated with the link, the position of the link within a source document, the position of the link in a list, the font color associated with the link, the attributes of the link, the number of words in the anchor text associated with the link, the actual words in the anchor text associated with the link, the determination of commerciality of the anchor text associated with the link, the type of the link, the context of words before or after the link, the topical cluster with which the anchor text of the link is associated, whether the link leads to a target document on a same host or domain, or whether an address associated with the link embeds another address.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Battle, Alexis, Dean, Jeffrey A., Anderson, Corin
Primary Examiner(s)
Alam; Hosain T
Assistant Examiner(s)
LIN, SHEW FEN

Application Number

US10/869,057
Time in Patent Office

2,154 Days
Field of Search

707/2, 707/3, 707/999.002, 707/999.005
US Class Current

707/748
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

G06F 40/134   Hyperlinking

Ranking documents based on user behavior and/or feature data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Ranking documents based on user behavior and/or feature data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links