Identifying text predicted to be of interest

US 9,852,215 B1
Filed: 09/21/2012
Issued: 12/26/2017
Est. Priority Date: 09/21/2012
Status: Active Grant

First Claim

Patent Images

1. One or more computer-readable media maintaining instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising:

accessing training data, the training data comprising;

a first text portion from a first electronic book, the first text portion associated with a positive feedback through a first user interaction received by a first computing device associated with a first user, anda second text portion from a second electronic book, the second text portion associated with a negative feedback through a second user interaction received by a second computing device associated with a second user;

training a classifier based at least in part on the training data;

applying the classifier to a text of a third electronic book, wherein the classifier;

assigns, to a third text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a first score that indicates a probability that the third text portion will be annotated by future users,assigns, to a fourth text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a second score indicating a probability that the fourth text portion will be annotated by future users,wherein the first score and the second score are assigned based at least in part on the positive feedback received through the first user interaction, the negative feedback received through the second user interaction, and at least one of;

a similarity to a sentence structure of the at least one of the first text portion or the second text portion, ora similarity to at least one of a type of words used in the first text portion or a type of words used in the second text portion; and

determines a ranking of at least the third text portion and the fourth text portion of the third electronic book based at least in part on the first score and the second score; and

selecting at least one of the third text portion or the fourth text portion based at least in part on the ranking.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A body of text may be compared with one or more user-selected text portions to rank a plurality of text portions of the body of text, such as for predicting which of the text portions are likely to be annotated by users. As one example, the text of a content item may be compared with excerpts of other content items that have been highlighted or otherwise annotated by a plurality of users. Based at least in part on the comparison, some implementations identify one or more portions of text of the content item that are likely to be selected or highlighted by users that access the content item. In some examples, a classifier may be trained based on popular highlights determined for a plurality of content items. The classifier may be applied to a body of text to determine portions that users are likely to consider profound or interesting.

57 Citations

View as Search Results

28 Claims

1. One or more computer-readable media maintaining instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising:
- accessing training data, the training data comprising;
  
  a first text portion from a first electronic book, the first text portion associated with a positive feedback through a first user interaction received by a first computing device associated with a first user, anda second text portion from a second electronic book, the second text portion associated with a negative feedback through a second user interaction received by a second computing device associated with a second user;
  
  training a classifier based at least in part on the training data;
  
  applying the classifier to a text of a third electronic book, wherein the classifier;
  
  assigns, to a third text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a first score that indicates a probability that the third text portion will be annotated by future users,assigns, to a fourth text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a second score indicating a probability that the fourth text portion will be annotated by future users,wherein the first score and the second score are assigned based at least in part on the positive feedback received through the first user interaction, the negative feedback received through the second user interaction, and at least one of;
  
  a similarity to a sentence structure of the at least one of the first text portion or the second text portion, ora similarity to at least one of a type of words used in the first text portion or a type of words used in the second text portion; and
  
  determines a ranking of at least the third text portion and the fourth text portion of the third electronic book based at least in part on the first score and the second score; and
  
  selecting at least one of the third text portion or the fourth text portion based at least in part on the ranking.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The one or more computer-readable media as recited in claim 1, wherein the ranking of the at least the third text portion and the fourth text portion of the third electronic book is based at least in part on whether individual text portions of the third electronic book are associated with at least one of:
    - a literary character mentioned in the third electronic book;
      
      a person mentioned in the third electronic book;
      
      a topic mentioned in the third electronic book;
      
      an organization mentioned in the third electronic book;
      
      a place mentioned in the third electronic book;
      
      a thing mentioned in the third electronic book;
      
      ora period of a setting mentioned in the third electronic book.
  - 3. The one or more computer-readable media as recited in claim 1, wherein at least one text portion of the at least the third text portion and the fourth text portion of the third electronic book is a highest-ranked text portion of the at least the third text portion and the fourth text portion of the third electronic book, the operations further comprising including the highest ranked text portion in an interface offering access to the third electronic book.
  - 4. The one or more computer-readable media as recited in claim 1, wherein at least one text portion of the at least the third text portion and the fourth text portion of the third electronic book is a highest-ranked text portion of the at least the third text portion and the fourth text portion of the third electronic book, the operations further comprising:
    - including an indication of the highest-ranked text portion in metadata associated with the third electronic book; and
      
      sending the metadata to a user device to identify the highest-ranked text portion of the third electronic book.
  - 5. The one or more computer-readable media as recited in claim 1, wherein the first text portion has been further identified based at least in part on:
    - receiving annotation information from a plurality of respective electronic devices corresponding to a plurality of users; and
      
      for the first electronic book, based at least in part on the annotation information, determining that the first text portion has been annotated by the plurality of users more frequently than one or more other portions of text of the first electronic book.

6. A method comprising:
- under control of one or more processors configured with executable instructions, receiving a content item comprising a first body of text, the first body of text comprising at least a first text portion and a second text portion;
  
  training a classifier based at least in part on an annotated text portion of a second body of text, the annotated text portion having been associated with a first reason through a user interaction received by a computing device associated with a first user, wherein the first body of text is different from the second body of text, and wherein, once trained, the classifier is configured to assign scores indicating a probability that a corresponding portion of the first text portion will be annotated by a second user based on the annotated text portion of the second body of text;
  
  assigning, using the trained classifier, and to the first text portion, a first score that indicates the probability that the first text portion will be annotated by the second user;
  
  assigning, using the trained classifier, and to the second text portion, a second score that indicates the probability that the second text portion will be annotated by the second user, wherein the first score and the second score are assigned based at least in part on the annotated text portion;
  
  ranking, based at least in part on the first score and the second score, the at least the first text portion and the second text portion of the first body of text;
  
  And selecting at least one of the first text portion or the second text portion based at least in part on the raking.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 7. The method as recited in claim 6, wherein the annotated text portion includes a portion of content that has been annotated by a plurality of users.
  - 8. The method as recited in claim 7, wherein the portion of content has been annotated by at least one of:
    - highlighting the text portion;
      
      writing a note associated with the text portion;
      
      bookmarking the text portion;
      
      orcommenting on the text portion.
  - 9. The method as recited in claim 6, wherein the annotated text portion is a portion of content that has been selected by the first user for posting to at least one of:
    - a social network site;
      
      a microblog site;
      
      oran online forum.
  - 10. The method as recited in claim 6, wherein the first body of text is a user review of a plurality of user reviews, and the operations further include presenting the first body of text to the second user based at least in part on the first score.
  - 11. The method as recited in claim 6, wherein the first body of text is a forum comment of a plurality of forum comments, and the operations further include presenting the first body of text to the second user based at least in part on the first score.
  - 12. The method as recited in claim 6, wherein the annotated text portion includes a plurality of user-selected text portions, individual user-selected portions corresponding to an individual content item, the method further comprising:
    - comparing the at least the first text portion and the second text portion of the first body of text with the plurality of the user-selected text portions; and
      
      ranking the at least the first text portion and the second text portion of the first body of text based at least in part on the comparing with the plurality of user-selected text portions.
  - 13. The method as recited in claim 6, wherein assigning the first score and the second score is based at least in part on comparing sentence structures of the at least the first text portion and the second text portion of the first body of text with a sentence structure of the annotated text portion.
  - 14. The method as recited in claim 6, wherein assigning the first score and the second score is based at least in part on comparing words used in the at least the first text portion and the second text portion of the first body of text with words used in the annotated text portion.
  - 15. The method as recited in claim 6, wherein the ranking the at least the first text portion and the second text portion of the first body of text is based at least in part on whether each text portion of the first body of text is associated with at least one of:
    - a literary character mentioned in the content item;
      
      a person mentioned in the content item;
      
      a topic mentioned in the content item;
      
      an organization mentioned in the content item;
      
      a place mentioned in the content item;
      
      a thing mentioned in the content item;
      
      ora period of a setting mentioned in the content item.
  - 16. The method as recited in claim 6, wherein the body of text is from a content item, the method further comprising sending information related to the ranking to an author of the content item.
  - 17. The method as recited in claim 6, the method further comprising:
    - receiving, via a user interface, a selection of a term from the content item, the term comprising at least one of;
      
      a literary character mentioned in the content item;
      
      a person mentioned in the content item;
      
      a topic mentioned in the content item;
      
      an organization mentioned in the content item;
      
      a place mentioned in the content item;
      
      a thing mentioned in the content item;
      
      ora period of a setting mentioned in the content item; and
      
      presenting at least one text portion corresponding to the selected term based at least in part on the ranking.
  - 18. The method of claim 6, further comprising:
    - identifying one or more names or one or more terms within the first body of text that are significant to the first body of text,determining that at least a name or term of the one or more names or one or more terms is associated with the first text portion; and
      
      storing an association between the first text portion and the at least one name or term of the one or more names or one or more terms.
  - 19. The method of claim 18, wherein identifying the one or more names or one or more terms comprises accessing a network accessible resource, and identifying the one or more names or the one or more terms based at least in part on the network accessible resource.
  - 20. The method of claim 18, further comprising:
    - receiving a user selection corresponding to the at least one name or term; and
      
      presenting the first text portion based at least in part on the user selection and the association.

21. A system comprising:
- one or more processors;
  
  one or more computer-readable media; and
  
  one or more modules maintained on the one or more computer-readable media that, when executed by the one or more processors, cause the one or more processors to perform operations including;
  
  accessing training data, the training data comprising one or more features of a first portion of text from within a first body of text, the first portion of text having been selected through a first user interaction received by a first computing device associated with a first user;
  
  training a classifier based at least in part on the accessed training data, wherein, once trained, the classifier is configured to assign scores indicating a probability that a corresponding portion of a second body of text will be annotated by a second user based on the first portion of text having been selected through the first user interaction;
  
  identifying a third portion of text and a fourth portion of text from within the second body of text;
  
  using the classifier to assign to the third portion of text a first score that indicates a probability that the third portion of text portion will be annotated by future users;
  
  using the classifier to assign to the fourth portion of text a second score that indicates a probability that the fourth portion of text portion will be annotated by future users, wherein the first score and the second score are assigned by the classifier based at least in part on the first user interaction;
  
  ranking the third portion of text and the fourth portion of text based at least in part on the first score and the second score; and
  
  identify the fourth portion of text, the fourth portion of text being identified based at least partly on the ranking.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
- - 22. The system as recited in claim 21, wherein the second body of text is from a content item, the operations further comprising:
    - receiving, from an electronic device, an indication of a text portion from within the second body of text which has been selected from within the content item; and
      
      using the indication to further train the classifier.
  - 23. The system as recited in claim 21, wherein the one or more features include a sentence structure of the first portion of text.
  - 24. The system as recited in claim 21, wherein the one or more features include at least one of:
    - words used in the first portion of text, andsynonyms of the words used in the first portion of text.
  - 25. The system as recited in claim 21, wherein the first portion of text is identified from annotation information received from a plurality of electronic devices.
  - 26. The system as recited in claim 25, wherein the annotation information identifies user annotations including at least one of:
    - highlighting the first portion of text;
      
      writing a note associated with the first portion of text;
      
      bookmarking the first portion of text;
      
      orcommenting on the first portion of text.
  - 27. The system as recited in claim 21, wherein the first portion of text is identified from requests, received from a plurality of electronic devices, to post a portion of a content item to at least one of:
    - a social network site;
      
      a microblog site;
      
      oran online discussion forum.
  - 28. The system as recited in claim 21, wherein a determination of the first portion of text is based at least in part on analysis of at least one user comment related to the first portions of text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Sullivan, Todd H., Dimson, Thomas F.
Primary Examiner(s)
Chbouki, Tarek

Application Number

US13/624,628
Time in Patent Office

1,922 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/3334   Selection or weighting of t...

G06F 16/335   Filtering based on addition...

G06F 16/36   Creation of semantic tools,...

G06N 20/00   Machine learning

Identifying text predicted to be of interest

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

57 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying text predicted to be of interest

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links