Identifying text predicted to be of interest
First Claim
1. One or more computer-readable media maintaining instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising:
- accessing training data, the training data comprising;
a first text portion from a first electronic book, the first text portion associated with a positive feedback through a first user interaction received by a first computing device associated with a first user, anda second text portion from a second electronic book, the second text portion associated with a negative feedback through a second user interaction received by a second computing device associated with a second user;
training a classifier based at least in part on the training data;
applying the classifier to a text of a third electronic book, wherein the classifier;
assigns, to a third text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a first score that indicates a probability that the third text portion will be annotated by future users,assigns, to a fourth text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a second score indicating a probability that the fourth text portion will be annotated by future users,wherein the first score and the second score are assigned based at least in part on the positive feedback received through the first user interaction, the negative feedback received through the second user interaction, and at least one of;
a similarity to a sentence structure of the at least one of the first text portion or the second text portion, ora similarity to at least one of a type of words used in the first text portion or a type of words used in the second text portion; and
determines a ranking of at least the third text portion and the fourth text portion of the third electronic book based at least in part on the first score and the second score; and
selecting at least one of the third text portion or the fourth text portion based at least in part on the ranking.
1 Assignment
0 Petitions
Accused Products
Abstract
A body of text may be compared with one or more user-selected text portions to rank a plurality of text portions of the body of text, such as for predicting which of the text portions are likely to be annotated by users. As one example, the text of a content item may be compared with excerpts of other content items that have been highlighted or otherwise annotated by a plurality of users. Based at least in part on the comparison, some implementations identify one or more portions of text of the content item that are likely to be selected or highlighted by users that access the content item. In some examples, a classifier may be trained based on popular highlights determined for a plurality of content items. The classifier may be applied to a body of text to determine portions that users are likely to consider profound or interesting.
57 Citations
28 Claims
-
1. One or more computer-readable media maintaining instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
accessing training data, the training data comprising; a first text portion from a first electronic book, the first text portion associated with a positive feedback through a first user interaction received by a first computing device associated with a first user, and a second text portion from a second electronic book, the second text portion associated with a negative feedback through a second user interaction received by a second computing device associated with a second user; training a classifier based at least in part on the training data; applying the classifier to a text of a third electronic book, wherein the classifier; assigns, to a third text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a first score that indicates a probability that the third text portion will be annotated by future users, assigns, to a fourth text portion of the text of the third electronic book and independent of annotation data associated with the third electronic book, a second score indicating a probability that the fourth text portion will be annotated by future users, wherein the first score and the second score are assigned based at least in part on the positive feedback received through the first user interaction, the negative feedback received through the second user interaction, and at least one of; a similarity to a sentence structure of the at least one of the first text portion or the second text portion, or a similarity to at least one of a type of words used in the first text portion or a type of words used in the second text portion; and determines a ranking of at least the third text portion and the fourth text portion of the third electronic book based at least in part on the first score and the second score; and selecting at least one of the third text portion or the fourth text portion based at least in part on the ranking. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
under control of one or more processors configured with executable instructions, receiving a content item comprising a first body of text, the first body of text comprising at least a first text portion and a second text portion; training a classifier based at least in part on an annotated text portion of a second body of text, the annotated text portion having been associated with a first reason through a user interaction received by a computing device associated with a first user, wherein the first body of text is different from the second body of text, and wherein, once trained, the classifier is configured to assign scores indicating a probability that a corresponding portion of the first text portion will be annotated by a second user based on the annotated text portion of the second body of text; assigning, using the trained classifier, and to the first text portion, a first score that indicates the probability that the first text portion will be annotated by the second user; assigning, using the trained classifier, and to the second text portion, a second score that indicates the probability that the second text portion will be annotated by the second user, wherein the first score and the second score are assigned based at least in part on the annotated text portion; ranking, based at least in part on the first score and the second score, the at least the first text portion and the second text portion of the first body of text; And selecting at least one of the first text portion or the second text portion based at least in part on the raking. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
one or more processors; one or more computer-readable media; and one or more modules maintained on the one or more computer-readable media that, when executed by the one or more processors, cause the one or more processors to perform operations including; accessing training data, the training data comprising one or more features of a first portion of text from within a first body of text, the first portion of text having been selected through a first user interaction received by a first computing device associated with a first user; training a classifier based at least in part on the accessed training data, wherein, once trained, the classifier is configured to assign scores indicating a probability that a corresponding portion of a second body of text will be annotated by a second user based on the first portion of text having been selected through the first user interaction; identifying a third portion of text and a fourth portion of text from within the second body of text; using the classifier to assign to the third portion of text a first score that indicates a probability that the third portion of text portion will be annotated by future users; using the classifier to assign to the fourth portion of text a second score that indicates a probability that the fourth portion of text portion will be annotated by future users, wherein the first score and the second score are assigned by the classifier based at least in part on the first user interaction; ranking the third portion of text and the fourth portion of text based at least in part on the first score and the second score; and identify the fourth portion of text, the fourth portion of text being identified based at least partly on the ranking. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
Specification