Contextual ranking of keywords using click data

US 8,051,080 B2
Filed: 06/03/2008
Issued: 11/01/2011
Est. Priority Date: 04/16/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

collecting usage data that indicates how frequently users interact with annotations for entities that are referenced in documents that are presented to the users;

based at least in part on the usage data, generating weights for features that are associated with the entities that are referenced in the documents;

wherein a particular weight of a particular feature is based at least in part on how frequently users interact with annotations of entities having the particular feature;

identifying a set of identified entities within a document;

determining a ranking for the identified entities that belong to said set of identified entities based, at least in part, on(a) feature scores for each of the identified entities, wherein the feature scores correspond to features associated with the identified entities, wherein the particular feature is associated with at least one of the identified entities; and

(b) weights, including the particular weight, for the features that are associated with the identified entities;

based at least in part on the ranking, automatically selecting a subset of the identified entities for annotation, wherein the subset includes fewer than all of the identified entities;

automatically generating an annotated version of the document by, for each entity in the subset, adding to the document a control for displaying additional information about the entity, wherein the additional information about the entity and the control associated with the entity were not in the document before the step of automatically generating the annotated version of the document;

wherein at least the steps of generating the weights, determining the ranking, automatically selecting the subset, and automatically generating the annotated version are performed by one or more computing devices.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for ranking the entities that are identified in a document based on an estimated likelihood that a user will actually make use of the annotations. According to one disclosed approach, usage data that indicates how users interact with annotations contained in documents presented to the users is collected. Based on the usage data, weights are generated for features of a feature vector. The weights are then used to modify feature scores of entities, and the modified feature scores are used to determine how to annotate documents. Specifically, a set of entities are identified within a document. A ranking for the identified entities is determined based, at least in part, on (a) feature vector scores for each of the identified entities, and (b) the weights generated for the features of the feature vector. The document is then annotated based, at least in part, on the ranking.

101 Citations

View as Search Results

32 Claims

1. A method comprising:
- collecting usage data that indicates how frequently users interact with annotations for entities that are referenced in documents that are presented to the users;
  
  based at least in part on the usage data, generating weights for features that are associated with the entities that are referenced in the documents;
  
  wherein a particular weight of a particular feature is based at least in part on how frequently users interact with annotations of entities having the particular feature;
  
  identifying a set of identified entities within a document;
  
  determining a ranking for the identified entities that belong to said set of identified entities based, at least in part, on(a) feature scores for each of the identified entities, wherein the feature scores correspond to features associated with the identified entities, wherein the particular feature is associated with at least one of the identified entities; and
  
  (b) weights, including the particular weight, for the features that are associated with the identified entities;
  
  based at least in part on the ranking, automatically selecting a subset of the identified entities for annotation, wherein the subset includes fewer than all of the identified entities;
  
  automatically generating an annotated version of the document by, for each entity in the subset, adding to the document a control for displaying additional information about the entity, wherein the additional information about the entity and the control associated with the entity were not in the document before the step of automatically generating the annotated version of the document;
  
  wherein at least the steps of generating the weights, determining the ranking, automatically selecting the subset, and automatically generating the annotated version are performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 25, 27, 28, 29)
- - 2. The method of claim 1 wherein a particular annotation for a particular entity comprises a hyperlink added to the document, wherein the hyperlink links to additional information about the particular entity, and wherein collecting usage data is performed by storing click-through information that indicates which hyperlinks, within the documents present to the users, were activated.
  - 3. The method of claim 1, wherein the annotated version of the document includes controls for displaying additional information only for those identified entities that belong to the subset and not for identified entities that do not belong to the subset.
  - 4. The method of claim 1, further comprising visually distinguishing an identified entity that ranks higher in the ranking from an identified entity that ranks lower in the ranking.
  - 5. The method of claim 1 wherein the features that are associated with the entities contained in the documents include one or more context-independent features, and one or more context-dependent features.
  - 6. The method of claim 5 wherein a score for at least one context-dependent feature is computed for a combination of a given entity and a given document based at least in part on a comparison between terms associated with the given entity and terms contained in the given document.
  - 7. The method of claim 6 further comprising the step of determining the terms associated with the given entity based, at least in part, on search engine result snippets associated with the given entity.
  - 8. The method of claim 6 further comprising the step of determining the terms associated with the given entity based, at least in part, on related query suggestions generated based at least in part on the given entity.
  - 9. The method of claim 1 wherein automatically generating the annotated version is performed in response to a request for the document, and wherein the annotated version is provided in response to the request.
  - 10. The method of claim 1 wherein the step of generating the weights includes using a machine learning mechanism to generate the weights based at least in part on a correlation between (a) feature scores that correspond to the features associated with the entities referenced in the documents, and (b) click-through-rates, indicated by the usage data, for the annotations of the entities referenced in the documents.
  - 25. The method of claim 1 wherein the step of generating the weights includes generating the particular weight based at least in part on (a) how frequently users hover over annotations of entities having the particular feature, and (b) how frequently users click through annotations of entities having the particular feature.
  - 27. The method of claim 1, wherein automatically generating the annotated version is performed before a request for the document is received, storing the annotated version until the request is received, and providing the annotated version in response to the request.
  - 28. The method of claim 1 wherein a particular annotation for a particular entity comprises a pop-up interface added to the document, wherein the pop-up interface, when hovered over by a cursor, displays additional information about the particular entity in a pop-up, and wherein collecting usage data is performed by storing annotation usage information that indicates which entities, within the documents present to the users, were hovered over.
  - 29. The method of claim 1, wherein generating weights for features that are associated with entities that are referenced in the documents comprises generating weights for features of feature vectors that are associated with the entities that are referenced in the documents, and wherein the feature scores for each of the identified entities comprise feature vector scores that correspond to features in feature vectors associated with the identified entities, wherein the particular feature is in at least one feature vector associated with at least one of the identified entities, and wherein the weights for the features that are associated with the identified entities comprise weights for the features in the feature vectors that are associated with the identified entities.

11. A method for annotating a document, the method comprising:
- generating a weight for a particular feature of entities, wherein the weight indicates how well the particular feature predicts whether annotations associated with the entities will be used;
  
  identifying a set of entities within the document;
  
  generating a first set of scores by generating, for each entity in the set, a score for said particular feature;
  
  generating a second set of scores based at least in part on said first set of scores and said weight;
  
  establishing a ranking of entities in the set of entities based, at least in part, on the second set of scores;
  
  based at least in part on the ranking, automatically selecting a subset of the set of entities for annotation, wherein the subset includes fewer than all of the identified entities;
  
  automatically generating an annotated version of the document by, for each entity in the subset, adding to the document a control for displaying additional information about the entity, wherein the additional information about the entity and the control associated with the entity were not in the document before the step of automatically generating the annotated version of the document;
  
  wherein at least the steps of generating the weight, generating the second set of scores, automatically selecting the subset, and automatically generating the annotated version are performed by one or more computing devices.
- View Dependent Claims (12)
- - 12. The method of claim 11 wherein the weight is generated based at least in part on click-through-data collected by monitoring how frequently users interact with documents that include annotated entities.

13. One or more non-transitory computer-readable storage media storing instructions, the instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
- collecting usage data that indicates how frequently users interact with annotations for entities that are referenced in documents presented to the users;
  
  based at least in part on the usage data, generating weights for features that are associated with the entities referenced in the documents;
  
  wherein a particular weight of a particular feature is based at least in part on how frequently users interact with annotations of entities having the particular feature;
  
  identifying a set of identified entities within a document;
  
  determining a ranking for the identified entities that belong to said set of identified identities based, at least in part, on(a) feature scores for each of the identified entities, wherein the feature scores correspond to features associated with the identified entities, wherein the particular feature is associated with at least one of the identified entities; and
  
  (b) weights, including the particular weight, for the features that are associated with the identified entities;
  
  based at least in part on the ranking, automatically selecting a subset of the identified entities for annotation, wherein the subset includes fewer than all of the identified entities;
  
  automatically generating an annotated version of the document by, for each entity in the subset, adding to the document a control for displaying additional information about the entity, wherein the additional information about the entity and the control associated with the entity were not in the document before the step of automatically generating the annotated version of the document.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 26, 30, 31, 32)
- - 14. The one or more non-transitory computer-readable storage media of claim 13 wherein a particular annotation for a particular entity comprises a hyperlink added to the document, wherein the hyperlink links to additional information about the particular entity, and wherein collecting usage data is performed by storing click-through information that indicates which hyperlinks, within the documents present to the users, were activated.
  - 15. The one or more non-transitory computer-readable storage medium of claim 13, wherein the annotated version of the document includes controls for displaying additional information only for those identified entities that belong to the subset and not for identified entities that do not belong to the subset.
  - 16. The one or more non-transitory computer-readable storage media of claim 13, further comprising visually distinguishing an identified entity that ranks higher in the ranking from an identified entity that ranks lower in the ranking.
  - 17. The one or more non-transitory computer-readable storage media of claim 13 wherein the features that are associated with the entities contained in the documents include one or more context-independent features, and one or more context-dependent features.
  - 18. The one or more non-transitory computer-readable storage media of claim 17 wherein a score for at least one context-dependent feature is computed for a combination of a given entity and a given document based at least in part on a comparison between terms associated with the given entity and terms contained in the given document.
  - 19. The one or more non-transitory computer-readable storage media of claim 18 further comprising instructions for determining the terms associated with the given entity based, at least in part, on search engine result snippets associated with the entity.
  - 20. The one or more non-transitory computer-readable storage media of claim 18 further comprising instructions for determining the terms associated with the given entity based, at least in part, on related query suggestions generated based at least in part on the given entity.
  - 21. The one or more non-transitory computer-readable storage media of claim 13 wherein the step of automatically generating the annotated version is performed in response to a request for the document, and wherein the annotated version is provided in response to the request.
  - 22. The one or more non-transitory computer-readable storage media of claim 13 wherein the step of generating the weights includes using a machine learning mechanism to generate the weights based at least in part on a correlation between (a) feature scores that correspond to the features associated with the entities referenced in the documents, and (b) click-through-rates, indicated by the usage data, for the annotations of the entities referenced in the documents.
  - 26. The one or more non-transitory computer-readable storage media of claim 13 wherein the step of generating the weights includes generating the particular weight based at least in part on (a) how frequently users hover over annotations of entities having the particular feature, and (b) how frequently users click through annotations of entities having the particular feature.
  - 30. The one or more non-transitory computer-readable storage media of claim 13, wherein the instructions, when executed by the one or more processors, cause the one or more processors to perform automatically generating the annotated version before a request for the document is received, storing the annotated version until the request is received, and providing the annotated version in response to the request.
  - 31. The one or more non-transitory computer-readable storage media of claim 13, wherein a particular annotation for a particular entity comprises a pop-up interface added to the document, wherein the pop-up interface, when hovered over by a cursor, displays additional information about the particular entity in a pop-up, and wherein the instructions, when executed by the one or more processors, cause the one or more processors to perform collecting usage data by storing annotation usage information that indicates which entities, within the documents present to the users, were hovered over.
  - 32. The one or more non-transitory computer-readable storage media of claim 13, wherein the instructions, when executed by the one or more processors, cause the one or more processors to perform generating weights for features that are associated with entities that are referenced in the documents by generating weights for features of feature vectors that are associated with the entities that are referenced in the documents, and wherein the feature scores for each of the identified entities comprise feature vector scores that correspond to features in feature vectors associated with the identified entities, wherein the particular feature is in at least one feature vector associated with at least one of the identified entities, and wherein the weights for the features that are associated with the identified entities comprise weights for the features in the feature vectors that are associated with the identified entities.

23. One or more non-transitory computer-readable storage media storing instructions, the instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
- generating a weight for a particular feature of entities, wherein the weight indicates how well the particular feature predicts whether annotations associated with the entities will be used;
  
  identifying a set of entities within a document;
  
  generating a first set of scores by generating, for each entity in the set, a score for said particular feature;
  
  generating a second set of scores based at least in part on said first set of scores and said weight;
  
  establishing a ranking of the entities in the set of entities based, at least in part, on the second set of scores;
  
  based at least in part on the ranking, automatically selecting a subset of the set of entities for annotation, wherein the subset includes fewer than all of the identified entities;
  
  automatically generating an annotated version of the document by, for each entity in the subset, adding to the document a control for displaying additional information about the entity, wherein the additional information about the entity and the control associated with the entity were not in the document before the step of automatically generating the annotated version of the document.
- View Dependent Claims (24)
- - 24. The one or more non-transitory computer-readable storage media of claim 23 wherein the weight is generated based at least in part on click-through-data collected by monitoring how frequently users interact with documents that include annotated entities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Irmak, Utku, Von Brzeski, Vadim, Kraft, Reiner
Primary Examiner(s)
MORRISON, JAY A

Application Number

US12/132,071
Publication Number

US 20090265338A1
Time in Patent Office

1,246 Days
Field of Search

None
US Class Current

707/736
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

Contextual ranking of keywords using click data

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

101 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Contextual ranking of keywords using click data

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

101 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links