AUTOMATICALLY CREATING A MAPPING BETWEEN TEXT DATA AND AUDIO DATA

US 20120310642A1
Filed: 10/06/2011
Published: 12/06/2012
Est. Priority Date: 06/03/2011
Status: Abandoned Application

First Claim

Patent Images

1. A method comprising:

receiving audio data that reflects an audible version of a work for which a textual version exists;

performing a speech-to-text analysis of the audio data to generate text for portions of the audio data; and

based on the text generated for the portions of the audio data, generating a mapping between a plurality of audio locations in the audio data and a corresponding plurality of text locations in the textual version of the work;

wherein the method is performed by one or more computing devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether or not the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work is used to identify a corresponding location with another version of the digital work. Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context (e.g., text).

381 Citations

30 Claims

1. A method comprising:
- receiving audio data that reflects an audible version of a work for which a textual version exists;
  
  performing a speech-to-text analysis of the audio data to generate text for portions of the audio data; and
  
  based on the text generated for the portions of the audio data, generating a mapping between a plurality of audio locations in the audio data and a corresponding plurality of text locations in the textual version of the work;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 2. The method of claim 1 wherein generating text for portions of the audio data includes generating text for portions of the audio data based, at least in part, on textual context of the work.
  - 3. The method of claim 2, wherein generating text for portions of the audio data based, at least in part, on textual context of the work includes generating text based, at least in part, on one or more rules of grammar used in the textual version of the work.
  - 4. The method of claim 2, wherein generating text for portions of the audio data based, at least in part, on textual context of the work includes limiting which words the portions can be translated to based on which words are in the textual version of the work, or a subset thereof.
  - 5. The method of claim 4, wherein limiting which words the portions can be translated to based on which words are in the textual version of the work includes, for a given portion of the audio data, identifying a sub-section of the textual version of the work that corresponds to the given portion and limiting the words to only those words in the sub-section of the textual version of the work.
  - 6. The method of claim 5, wherein:
    - identifying the sub-section of the textual version of the work includes maintaining a current text location in the textual version of the work that corresponds to a current audio location, in the audio data, of the speech-to-text analysis; and
      
      the sub-section of the textual version of the work is a section associated with the current text location.
  - 7. The method of claim 1, wherein the portions include portions that correspond to individual words, and the mapping maps the locations of the portions that correspond to individual words to individual words in the textual version of the work.
  - 8. The method of claim 1, wherein the portions include portions that correspond to individual sentences, and the mapping maps the locations of the portions that correspond to individual sentences to individual sentences in the textual version of the work.
  - 9. The method of claim 1, wherein the portions include portions that correspond to fixed amounts of data, and the mapping maps the locations of the portions that correspond to fixed amounts of data to corresponding locations in the textual version of the work.
  - 10. The method of claim 1, wherein generating the mapping includes:
    - (1) embedding anchors in the audio data;
      
      (2) embedding anchors in the textual version of the work;
      
      or (3) storing the mapping in a media overlay that is stored in association with the audio data or the textual version of the work.
  - 11. The method of claim 1, wherein each of one or more text locations of the plurality of text locations indicates a relative location in the textual version of the work.
  - 12. The method of claim 1, wherein one text location, of the plurality of text locations, indicates a relative location in the textual version of the work and another text location, of the plurality of text locations, indicates an absolute location from the relative location.
  - 13. The method of claim 1, wherein each of one or more text locations of the plurality of text locations indicates an anchor within the textual version of the work.
  - 16. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 1.
  - 17. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 2.
  - 18. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 3.
  - 19. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 4.
  - 20. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 5.
  - 21. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 6.
  - 22. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 7.
  - 23. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 8.
  - 24. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 9.
  - 25. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 10.
  - 26. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 11.
  - 27. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 12.
  - 28. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 13.

14. A method comprising:
- receiving a textual version of a work;
  
  performing a text-to-speech analysis of the textual version to generate first audio data;
  
  based on the first audio data and the textual version, generating a first mapping between a first plurality of audio locations in the first audio data and a corresponding plurality of text locations in the textual version of the work;
  
  receiving second audio data that reflects an audible version of the work for which the textual version exists; and
  
  based on (1) a comparison of the first audio data and the second audio data and (2) the first mapping, generating a second mapping between a second plurality of audio locations in the second audio data and the plurality of text locations in the textual version of the work;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (29)
- - 29. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 14.

15. A method comprising:
- receiving audio input;
  
  performing a speech-to-text analysis of the audio input to generate text for portions of the audio input;
  
  determining whether the text generated for portions of the audio input matches text that is currently displayed; and
  
  in response to determining that the text matches text that is currently displayed, causing the text that is currently displayed to be highlighted;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (30)
- - 30. One or more storage media storing instructions which, when executed by one or more processors, causes performance of the method recited in claim 15.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Cannistraro, Alan C., Robbin, Gregory S., Dougherty, Casey M., Cao, Xiang

Application Number

US13/267,738
Publication Number

US 20120310642A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G06F 16/685   using automatically derived...

G06F 40/169   Annotation, e.g. comment da...

G10L 13/00   Speech synthesis; Text to s...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/26   Speech to text systems G10L...

AUTOMATICALLY CREATING A MAPPING BETWEEN TEXT DATA AND AUDIO DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

381 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATICALLY CREATING A MAPPING BETWEEN TEXT DATA AND AUDIO DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

381 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links