IDENTIFYING CORRESPONDING REGIONS OF CONTENT
First Claim
1. A computer-implemented method comprising:
- as implemented by one or more computing devices configured with specific computer-executable instructions,identifying audio data of an audio content item that is preliminarily correlated to a region of a textual content item;
determining that text within the region of the textual content item does not correspond to the audio data of the audio content item;
transcribing the audio data with automated speech recognition to generate a textual transcription of the audio data;
determining that the textual transcription corresponds to the text within the region of the textual content; and
generating synchronization information that associates the audio data of the audio content with the text within the region of the textual content.
1 Assignment
0 Petitions
Accused Products
Abstract
A content alignment service may generate content synchronization information to facilitate the synchronous presentation of audio content and textual content. In some embodiments, a region of the textual content whose correspondence to the audio content is uncertain may be analyzed to determine whether the region of textual content corresponds to one or more words that are audibly presented in the audio content, or whether the region of textual content is a mismatch with respect to the audio content. In some embodiments, words in the textual content that correspond to words in the audio content are synchronously presented, while mismatched words in the textual content may be skipped to maintain synchronous presentation. Accordingly, in one example application, an audiobook is synchronized with an electronic book, so that as the electronic book is displayed, corresponding words of the audiobook are audibly presented.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
as implemented by one or more computing devices configured with specific computer-executable instructions, identifying audio data of an audio content item that is preliminarily correlated to a region of a textual content item; determining that text within the region of the textual content item does not correspond to the audio data of the audio content item; transcribing the audio data with automated speech recognition to generate a textual transcription of the audio data; determining that the textual transcription corresponds to the text within the region of the textual content; and generating synchronization information that associates the audio data of the audio content with the text within the region of the textual content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A system comprising:
-
a non-transitory data store storing audio data of an audio content item and text of a textual content item; and a processor in communication with the non-transitory data store and configured with specific computer-executable instructions that, when executed by the processor, cause the processor to at least; identify a preliminary correlation between the audio data and a region of the textual content item; determine that text within the region of the textual content item does not correspond to the audio data of the audio content item; transcribe the audio data with automated speech recognition to generate a textual transcription of the audio data; determine that the textual transcription corresponds to the text within the region of the textual content item; and generate synchronization information that associates the audio data of the audio content with the text within the region of the textual content with. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. Non-transitory computer-readable storage media comprising computer-executable instructions, that, when executed by a computing system, cause the computing system to at least:
-
compare text within a region of a textual content item and audio data of an audio content item that is preliminary correlated to the region of the textual content item; determine that the text within the region the textual content item does not correspond to the audio data of the audio content item; apply automated speech recognition to the audio data to generate a textual transcription of the audio data; determine that the textual transcription corresponds to the text within the region of the textual content item; and generate synchronization information that associates the audio data of the audio content with the text within the region of the textual content item. - View Dependent Claims (18, 19, 20)
-
Specification