AUTOMATICALLY CREATING A MAPPING BETWEEN TEXT DATA AND AUDIO DATA
First Claim
1. A method comprising:
- receiving audio data that reflects an audible version of a work for which a textual version exists;
performing a speech-to-text analysis of the audio data to generate text for portions of the audio data; and
based on the text generated for the portions of the audio data, generating a mapping between a plurality of audio locations in the audio data and a corresponding plurality of text locations in the textual version of the work;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether or not the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work is used to identify a corresponding location with another version of the digital work. Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context (e.g., text).
381 Citations
30 Claims
-
1. A method comprising:
-
receiving audio data that reflects an audible version of a work for which a textual version exists; performing a speech-to-text analysis of the audio data to generate text for portions of the audio data; and based on the text generated for the portions of the audio data, generating a mapping between a plurality of audio locations in the audio data and a corresponding plurality of text locations in the textual version of the work; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
14. A method comprising:
-
receiving a textual version of a work; performing a text-to-speech analysis of the textual version to generate first audio data; based on the first audio data and the textual version, generating a first mapping between a first plurality of audio locations in the first audio data and a corresponding plurality of text locations in the textual version of the work; receiving second audio data that reflects an audible version of the work for which the textual version exists; and based on (1) a comparison of the first audio data and the second audio data and (2) the first mapping, generating a second mapping between a second plurality of audio locations in the second audio data and the plurality of text locations in the textual version of the work; wherein the method is performed by one or more computing devices. - View Dependent Claims (29)
-
-
15. A method comprising:
-
receiving audio input; performing a speech-to-text analysis of the audio input to generate text for portions of the audio input; determining whether the text generated for portions of the audio input matches text that is currently displayed; and in response to determining that the text matches text that is currently displayed, causing the text that is currently displayed to be highlighted; wherein the method is performed by one or more computing devices. - View Dependent Claims (30)
-
Specification