Identifying book title sets
First Claim
Patent Images
1. A computer-implemented method, comprising:
- under control of one or more processors configured with executable instructions,receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book having first body text and first metadata;
normalizing the electronic book by removing illustrations from the electronic book, removing extraneous characters from the electronic book, and converting characters of the electronic book to a single case;
determining, in response to the normalizing of the electronic book, whether the first metadata of the electronic book matches metadata of any existing book title sets;
based at least partly on a first determination that the first metadata of the electronic book matches second metadata of no more than a single existing book title set that includes at least one book, adding the electronic book to the single existing book title set such that the single existing book title set includes the at least one book and the electronic book;
based at least partly on a second determination that the first metadata of the electronic book matches third metadata of multiple existing book title sets, calculating a text matching score corresponding to individual ones of the existing book title sets, the text matching score indicating a comparison of a first frequency of one or more words included in the first body text of the electronic book and a second frequency of the one or more words included in second body text of the corresponding existing book title set; and
adding the electronic book to an existing book title set of the multiple existing book title sets based at least partly on the text matching score corresponding to the existing book title set being greater than a specified threshold, the existing book title set including the electronic book and one or more other books.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are described for identifying book title sets. The techniques may include a first-pass comparison with other books to identify other candidate title sets. A second-pass comparison may then be performed with respect to the candidate title sets. The first-pass comparison may be based on book metadata such as titles and authorship. The second-pass comparison may include a more comprehensive content comparison, such as comparing the body text of the books.
-
Citations
24 Claims
-
1. A computer-implemented method, comprising:
-
under control of one or more processors configured with executable instructions, receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book having first body text and first metadata; normalizing the electronic book by removing illustrations from the electronic book, removing extraneous characters from the electronic book, and converting characters of the electronic book to a single case; determining, in response to the normalizing of the electronic book, whether the first metadata of the electronic book matches metadata of any existing book title sets; based at least partly on a first determination that the first metadata of the electronic book matches second metadata of no more than a single existing book title set that includes at least one book, adding the electronic book to the single existing book title set such that the single existing book title set includes the at least one book and the electronic book; based at least partly on a second determination that the first metadata of the electronic book matches third metadata of multiple existing book title sets, calculating a text matching score corresponding to individual ones of the existing book title sets, the text matching score indicating a comparison of a first frequency of one or more words included in the first body text of the electronic book and a second frequency of the one or more words included in second body text of the corresponding existing book title set; and adding the electronic book to an existing book title set of the multiple existing book title sets based at least partly on the text matching score corresponding to the existing book title set being greater than a specified threshold, the existing book title set including the electronic book and one or more other books. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method, comprising:
-
under control of one or more processors configured with executable instructions, receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book having first body text and first metadata; normalizing the electronic book by at least one of removing illustrations from the electronic book, removing extraneous characters from the electronic book, or converting characters of the electronic book to a single case; comparing the first metadata of the electronic book with second metadata corresponding to other books to identify one or more candidate title sets of which the electronic book may be a member; determining that a number of the one or more candidate title sets meets or exceeds a pre-determined number of candidate title sets; and based at least partly on the determining that the number of the one or more candidate title sets meets or exceeds the pre-determined number of candidate title sets, comparing the first body text of the electronic book with second body text of the one or more candidate title sets to determine that the electronic book is a member of the one or more candidate title sets. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An online electronic book service, comprising:
-
one or more processors; and one or more non-transitory computer-readable storage media containing instructions that are executable by the one or more processors to perform actions comprising; receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book; normalizing the electronic book by at least one of removing illustrations from the electronic book, removing extraneous characters from the electronic book, or converting characters of the electronic book to a single case; performing a first-pass comparison of metadata of the electronic book with metadata of different book title sets to identify one or more candidate title sets of which the electronic book may be a member; and based at least partly on a determination that the first-pass comparison identifies a partial match for multiple candidate title sets, performing a second-pass comparison of first body text of the electronic book with second body text of the multiple candidate title sets to determine that the electronic book is a member of any of the multiple candidate title sets. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification