Identifying corresponding regions of content
First Claim
1. A system comprising:
- an electronic data store configured to store;
an audiobook; and
an electronic book that is a companion to the audiobook;
a computing device in communication with the electronic data store, the computing device configured to;
generate a textual transcription of the audiobook;
compare the textual transcription of the audiobook with text of the electronic book to identify an uncertain region in the electronic book, wherein the uncertain region includes text of the electronic book for which corresponding audio in the audiobook has not yet been identified;
identify a region of audio content within the audiobook that is preliminarily aligned to the text included in the uncertain region;
generate a language model using the text included in the uncertain region;
apply the language model to the region of audio content within the audiobook to generate an updated textual transcription of the audiobook;
determine that one or more words of the updated textual transcription substantially correspond to one or more words in the text included within the uncertain region; and
generate content synchronization information, wherein the content synchronization information enables synchronous presentation of the one or more words in the text included within the uncertain region and a portion of audio content within the audiobook corresponding to the one or more words of the updated textual transcription.
1 Assignment
0 Petitions
Accused Products
Abstract
A content alignment service may generate content synchronization information to facilitate the synchronous presentation of audio content and textual content. In some embodiments, a region of the textual content whose correspondence to the audio content is uncertain may be analyzed to determine whether the region of textual content corresponds to one or more words that are audibly presented in the audio content, or whether the region of textual content is a mismatch with respect to the audio content. In some embodiments, words in the textual content that correspond to words in the audio content are synchronously presented, while mismatched words in the textual content may be skipped to maintain synchronous presentation. Accordingly, in one example application, an audiobook is synchronized with an electronic book, so that as the electronic book is displayed, corresponding words of the audiobook are audibly presented.
109 Citations
26 Claims
-
1. A system comprising:
-
an electronic data store configured to store; an audiobook; and an electronic book that is a companion to the audiobook; a computing device in communication with the electronic data store, the computing device configured to; generate a textual transcription of the audiobook; compare the textual transcription of the audiobook with text of the electronic book to identify an uncertain region in the electronic book, wherein the uncertain region includes text of the electronic book for which corresponding audio in the audiobook has not yet been identified; identify a region of audio content within the audiobook that is preliminarily aligned to the text included in the uncertain region; generate a language model using the text included in the uncertain region; apply the language model to the region of audio content within the audiobook to generate an updated textual transcription of the audiobook; determine that one or more words of the updated textual transcription substantially correspond to one or more words in the text included within the uncertain region; and generate content synchronization information, wherein the content synchronization information enables synchronous presentation of the one or more words in the text included within the uncertain region and a portion of audio content within the audiobook corresponding to the one or more words of the updated textual transcription. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method comprising:
as implemented by one or more computing devices configured with specific computer-executable instructions, identifying an uncertain region in an item of textual content, wherein the uncertain region includes text of the item of textual content for which corresponding audio in a companion item of audio content has not yet been identified; identifying a region of the companion item of audio content that is preliminarily aligned to the uncertain region in the item of textual content; applying a language model, including text of the uncertain region, to the region of the companion item of audio content to generate a textual transcription of the region of the item of audio; determining that a portion of the textual transcription substantially corresponds to a portion of the text of the item of textual content included within the uncertain region; and generating content synchronization information for synchronizing presentation of the portion of the text of the item of textual content and a portion of the item of audio content corresponding to the portion of the textual transcription. - View Dependent Claims (5, 6, 7, 8)
-
9. A system for synchronizing presentation of an item of audio content to a companion item of textual content, the system comprising:
-
an electronic data store configured to store content synchronization information; and a computing device in communication with the electronic data store, the computing device being configured to; identify an uncertain region in the companion item of textual content, the uncertain region comprising text of the companion item of textual content for which corresponding audio in the item of audio content has not yet been identified; identify a region of the item of audio content that is preliminarily aligned to the text of the uncertain region; apply a language model including the text of the uncertain region to the region of the item of audio content to generate a textual transcription of the region of the item of audio content; convert at least a portion of the text of the uncertain region to a first phoneme string; convert at least a portion of the textual transcription to a second phoneme string; determine that the first phoneme string substantially corresponds to the second phoneme string; generate content synchronization information that facilitates the synchronous presentation of the portion of the text of the uncertain region in the companion item of textual content and a portion of the item of audio content corresponding to the portion of the textual transcription. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium having a computer-executable module, the computer-executable module configured to:
-
identify an uncertain region in an item of textual content, the uncertain region comprising text of the item of textual content for which corresponding audio in an item of audio content has not yet been identified; identify a region of the item of audio content that is preliminarily aligned to text of the uncertain region; apply a language model including text of the uncertain region to the region of the item of audio content to generate a textual transcription of the region of the item of audio content; determine that the text of the uncertain region substantially corresponds to the text of the textual transcription; determine that the text of the uncertain region substantially correspond to the text of the textual transcription; and generate content synchronization information, wherein the content synchronization information enables synchronous presentation of the uncertain region in the item of textual content and the region of the item of audio content. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer-implemented method comprising:
as implemented by one or more computing devices configured with specific computer-executable instructions, identifying an uncertain region in an item of textual content, wherein the uncertain region includes text of the item of textual content for which corresponding audio in an item of audio content has not yet been identified; identifying a region of the item of audio content that is preliminarily aligned to the uncertain region in the item of textual content; applying a language model including the text of the uncertain region to the region of the item of audio content to generate a textual transcription of the region of the item of audio content; identifying a significant corresponding word included in both the textual transcription and the text of item of textual content included in the uncertain region; and generating content synchronization information enabling the synchronous presentation of the significant corresponding word in both the item of textual content and the item of audio content. - View Dependent Claims (21, 22, 23, 24, 25, 26)
Specification