CONTEXTUAL RANKING OF KEYWORDS USING CLICK DATA

US 20090265338A1
Filed: 06/03/2008
Published: 10/22/2009
Est. Priority Date: 04/16/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

collecting usage data that indicates how users interact with annotations contained in documents presented to the users, wherein said annotations are associated with entities contained in the documents;

based on the usage data, generating weights for features of a feature vector;

identifying a set of identified entities within a document;

determining a ranking for the identified entities that belong to said set of identified identities based, at least in part, on(a) feature vector scores for each of the identified entities, wherein the feature vector scores correspond to features in the feature vector; and

(b) the weights generated for the features of the feature vector.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for ranking the entities that are identified in a document based on an estimated likelihood that a user will actually make use of the annotations. According to one disclosed approach, usage data that indicates how users interact with annotations contained in documents presented to the users is collected. Based on the usage data, weights are generated for features of a feature vector. The weights are then used to modify feature scores of entities, and the modified feature scores are used to determine how to annotate documents. Specifically, a set of entities are identified within a document. A ranking for the identified entities is determined based, at least in part, on (a) feature vector scores for each of the identified entities, and (b) the weights generated for the features of the feature vector. The document is then annotated based, at least in part, on the ranking.

151 Citations

View as Search Results

28 Claims

1. A method comprising:
- collecting usage data that indicates how users interact with annotations contained in documents presented to the users, wherein said annotations are associated with entities contained in the documents;
  
  based on the usage data, generating weights for features of a feature vector;
  
  identifying a set of identified entities within a document;
  
  determining a ranking for the identified entities that belong to said set of identified identities based, at least in part, on(a) feature vector scores for each of the identified entities, wherein the feature vector scores correspond to features in the feature vector; and
  
  (b) the weights generated for the features of the feature vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 further comprising annotating the document based, at least in part, on said ranking.
  - 3. The method of claim 1 wherein the annotations are hyperlinks, and collecting usage data is performed by storing click-through information that indicates which hyperlinks, within the documents present to the users, were activated.
  - 4. The method of claim 2 wherein the step of annotating the document includes:
    - based on the ranking, selecting a subset of the identified entities from the set of identified entities;
      
      annotating, within the document, only those identified entities that belong to the subset.
  - 5. The method of claim 2 wherein the step of annotating includes annotating the document in a manner that visually distinguishes an identified entity that ranks higher in the ranking from an identified entity that ranks lower in the ranking.
  - 6. The method of claim 1 wherein the feature vector includes one or more context-independent features, and one or more context-dependent features.
  - 7. The method of claim 6 wherein a score for at least one context-dependent feature is computed for a given entity/document combination based on a comparison between terms associated with the given entity and terms contained in the given document.
  - 8. The method of claim 7 further comprising the step of determining the terms associated with the given entity based, at least in part, on search engine result snippets associated with the entity.
  - 9. The method of claim 7 further comprising the step of determining the terms associated with the given entity based, at least in part, on results produced by a tool for query refinement based on said given entity.
  - 10. The method of claim 7 further comprising the step of determining the terms associated with the given entity based, at least in part, on related query suggestions generated based on the given entity.
  - 11. The method of claim 2 wherein the step of annotating the document is performed in real-time in response to a request for the document.
  - 12. The method of claim 1 wherein the step of generating weights includes using a machine learning mechanism to generate weights based on correlation between (a) scores, generated for entities, for the features, and (b) click-through-rates, indicated by the usage data, for annotations associated with the entities.

13. A method for annotating a document, the method comprising:
- generating a weight for a particular feature, wherein the weight indicates how well the particular feature predicts whether annotations associated with entities will be used;
  
  identifying a set of entities within the document;
  
  generating a first set of scores by generating, for each entity in the set, a score for said particular feature;
  
  generating a second set of scores based on said first set of scores and said weight;
  
  establishing a ranking of the entities in the set of entities based, at least in part, on the second set of scores; and
  
  annotating one or more entities in the document based, at least in part, on said ranking.
- View Dependent Claims (14)
- - 14. The method of claim 13 wherein the weight is generated based on click-through-data collected by monitoring how users interact with documents that include annotated entities.

15. A computer-readable storage medium storing instructions, the instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
- collecting usage data that indicates how users interact with annotations contained in documents presented to the users, wherein said annotations are associated with entities contained in the documents;
  
  based on the usage data, generating weights for features of a feature vector;
  
  identifying a set of identified entities within a document;
  
  determining a ranking for the identified entities that belong to said set of identified identities based, at least in part, on(a) feature vector scores for each of the identified entities, wherein the feature vector scores correspond to features in the feature vector; and
  
  (b) the weights generated for the features of the feature vector.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 16. The computer-readable storage medium of claim 15 further comprising instructions for annotating the document based, at least in part, on said ranking.
  - 17. The computer-readable storage medium of claim 15 wherein the annotations are hyperlinks, and collecting usage data is performed by storing click-through information that indicates which hyperlinks, within the documents present to the users, were activated.
  - 18. The computer-readable storage medium of claim 16 wherein the step of annotating the document includes:
    - based on the ranking, selecting a subset of the identified entities from the set of identified entities;
      
      annotating, within the document, only those identified entities that belong to the subset.
  - 19. The computer-readable storage medium of claim 16 wherein the step of annotating includes annotating the document in a manner that visually distinguishes an identified entity that ranks higher in the ranking from an identified entity that ranks lower in the ranking.
  - 20. The computer-readable storage medium of claim 15 wherein the feature vector includes one or more context-independent features, and one or more context-dependent features.
  - 21. The computer-readable storage medium of claim 20 wherein a score for at least one context-dependent feature is computed for a given entity/document combination based on a comparison between terms associated with the given entity and terms contained in the given document.
  - 22. The computer-readable storage medium of claim 21 further comprising instructions for determining the terms associated with the given entity based, at least in part, on search engine result snippets associated with the entity.
  - 23. The computer-readable storage medium of claim 21 further comprising instructions for determining the terms associated with the given entity based, at least in part, on results produced by a tool for query refinement based on said given entity.
  - 24. The computer-readable storage medium of claim 21 further comprising instructions for determining the terms associated with the given entity based, at least in part, on related query suggestions generated based on the given entity.
  - 25. The computer-readable storage medium of claim 16 wherein the step of annotating the document is performed in real-time in response to a request for the document.
  - 26. The computer-readable storage medium of claim 15 wherein the step of generating weights includes using a machine learning mechanism to generate weights based on correlation between (a) scores, generated for entities, for the features, and (b) click-through-rates, indicated by the usage data, for annotations associated with the entities.

27. A computer-readable storage medium storing instructions, the instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
- generating a weight for a particular feature, wherein the weight indicates how well the particular feature predicts whether annotations associated with entities will be used;
  
  identifying a set of entities within a document;
  
  generating a first set of scores by generating, for each entity in the set, a score for said particular feature;
  
  generating a second set of scores based on said first set of scores and said weight;
  
  establishing a ranking of the entities in the set of entities based, at least in part, on the second set of scores; and
  
  annotating one or more entities in the document based, at least in part, on said ranking.
- View Dependent Claims (28)
- - 28. The computer-readable storage medium of claim 27 wherein the weight is generated based on click-through-data collected by monitoring how users interact with documents that include annotated entities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Kraft, Reiner, Irmak, Utku, Brzeski, Vadim Von

Granted Patent

US 8,051,080 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

CONTEXTUAL RANKING OF KEYWORDS USING CLICK DATA

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

151 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

CONTEXTUAL RANKING OF KEYWORDS USING CLICK DATA

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

151 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links