Identifying corresponding regions of content

US 9,099,089 B2
Filed: 09/05/2012
Issued: 08/04/2015
Est. Priority Date: 08/02/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

an electronic data store configured to store;

an audiobook; and

an electronic book that is a companion to the audiobook;

a computing device in communication with the electronic data store, the computing device configured to;

generate a textual transcription of the audiobook;

compare the textual transcription of the audiobook with text of the electronic book to identify an uncertain region in the electronic book, wherein the uncertain region includes text of the electronic book for which corresponding audio in the audiobook has not yet been identified;

identify a region of audio content within the audiobook that is preliminarily aligned to the text included in the uncertain region;

generate a language model using the text included in the uncertain region;

apply the language model to the region of audio content within the audiobook to generate an updated textual transcription of the audiobook;

determine that one or more words of the updated textual transcription substantially correspond to one or more words in the text included within the uncertain region; and

generate content synchronization information, wherein the content synchronization information enables synchronous presentation of the one or more words in the text included within the uncertain region and a portion of audio content within the audiobook corresponding to the one or more words of the updated textual transcription.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A content alignment service may generate content synchronization information to facilitate the synchronous presentation of audio content and textual content. In some embodiments, a region of the textual content whose correspondence to the audio content is uncertain may be analyzed to determine whether the region of textual content corresponds to one or more words that are audibly presented in the audio content, or whether the region of textual content is a mismatch with respect to the audio content. In some embodiments, words in the textual content that correspond to words in the audio content are synchronously presented, while mismatched words in the textual content may be skipped to maintain synchronous presentation. Accordingly, in one example application, an audiobook is synchronized with an electronic book, so that as the electronic book is displayed, corresponding words of the audiobook are audibly presented.

109 Citations

View as Search Results

26 Claims

1. A system comprising:
- an electronic data store configured to store;
  
  an audiobook; and
  
  an electronic book that is a companion to the audiobook;
  
  a computing device in communication with the electronic data store, the computing device configured to;
  
  generate a textual transcription of the audiobook;
  
  compare the textual transcription of the audiobook with text of the electronic book to identify an uncertain region in the electronic book, wherein the uncertain region includes text of the electronic book for which corresponding audio in the audiobook has not yet been identified;
  
  identify a region of audio content within the audiobook that is preliminarily aligned to the text included in the uncertain region;
  
  generate a language model using the text included in the uncertain region;
  
  apply the language model to the region of audio content within the audiobook to generate an updated textual transcription of the audiobook;
  
  determine that one or more words of the updated textual transcription substantially correspond to one or more words in the text included within the uncertain region; and
  
  generate content synchronization information, wherein the content synchronization information enables synchronous presentation of the one or more words in the text included within the uncertain region and a portion of audio content within the audiobook corresponding to the one or more words of the updated textual transcription.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1, wherein the computing device is further configured to provide the content synchronization information to a separate computing device.
  - 3. The system of claim 1, wherein the computing device is further configured to synchronously present the one or more words in the text of the electronic book included within the uncertain region and the portion of audio content within the audiobook corresponding to the one or more words of the updated textual transcription.

4. A computer-implemented method comprising:
- as implemented by one or more computing devices configured with specific computer-executable instructions,identifying an uncertain region in an item of textual content, wherein the uncertain region includes text of the item of textual content for which corresponding audio in a companion item of audio content has not yet been identified;
  
  identifying a region of the companion item of audio content that is preliminarily aligned to the uncertain region in the item of textual content;
  
  applying a language model, including text of the uncertain region, to the region of the companion item of audio content to generate a textual transcription of the region of the item of audio;
  
  determining that a portion of the textual transcription substantially corresponds to a portion of the text of the item of textual content included within the uncertain region; and
  
  generating content synchronization information for synchronizing presentation of the portion of the text of the item of textual content and a portion of the item of audio content corresponding to the portion of the textual transcription.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The computer-implemented method of claim 4, wherein the portion of the textual transcription substantially corresponds to the portion of the text of the item of textual content included within the uncertain region if they have a portion score satisfying a threshold.
  - 6. The computer-implemented method of claim 4, wherein the portion of the textual transcription substantially corresponds to the portion of the text of the item of textual content included within the uncertain region if at least a threshold percentage of words of the portion of the textual transcription correspond to words of the portion of the text of the item of textual content included within the uncertain region.
  - 7. The computer-implemented method of claim 6, wherein a word of the portion of the textual transcription corresponds to a word of the portion of the text of the item of textual content included within the uncertain region if the word of the portion of the textual transcription substantially matches and chronologically corresponds to the word of the portion of the text of the item of textual content included within the uncertain region.
  - 8. The computer-implemented method of claim 4, wherein the uncertain region of the item of textual content is identified based at least in part by comparing the uncertain region to an initial transcription of the item of audio content.

9. A system for synchronizing presentation of an item of audio content to a companion item of textual content, the system comprising:
- an electronic data store configured to store content synchronization information; and
  
  a computing device in communication with the electronic data store, the computing device being configured to;
  
  identify an uncertain region in the companion item of textual content, the uncertain region comprising text of the companion item of textual content for which corresponding audio in the item of audio content has not yet been identified;
  
  identify a region of the item of audio content that is preliminarily aligned to the text of the uncertain region;
  
  apply a language model including the text of the uncertain region to the region of the item of audio content to generate a textual transcription of the region of the item of audio content;
  
  convert at least a portion of the text of the uncertain region to a first phoneme string;
  
  convert at least a portion of the textual transcription to a second phoneme string;
  
  determine that the first phoneme string substantially corresponds to the second phoneme string;
  
  generate content synchronization information that facilitates the synchronous presentation of the portion of the text of the uncertain region in the companion item of textual content and a portion of the item of audio content corresponding to the portion of the textual transcription.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The system of claim 9, wherein the first phoneme string substantially corresponds to the second phoneme string if the first phoneme string is within a threshold Levenshtein distance from the second phoneme string.
  - 11. The system of claim 9, wherein:
    - the computing device is further configured to generate an acoustically confusable hypothesis for the first phoneme string; and
      
      the first phoneme string substantially corresponds to the second phoneme string if the acoustically confusable hypothesis for the first phoneme string is at least substantially similar to the second phoneme string.
  - 12. The system of claim 9, wherein:
    - the computing device is further configured to generate an acoustically confusable hypothesis for the second phoneme string; and
      
      the first phoneme string substantially corresponds to the second phoneme string if the acoustically confusable hypothesis for the second phoneme string is at least substantially similar to the first phoneme string.
  - 13. The system of claim 9, wherein:
    - the computing device is further configured to generate a first acoustically confusable hypothesis for the first phoneme string and a second acoustically confusable hypothesis for the second phoneme string; and
      
      the first phoneme string substantially corresponds to the second phoneme string if the first acoustically confusable hypothesis is at least substantially similar to the second acoustically confusable hypothesis.

14. A non-transitory computer-readable medium having a computer-executable module, the computer-executable module configured to:
- identify an uncertain region in an item of textual content, the uncertain region comprising text of the item of textual content for which corresponding audio in an item of audio content has not yet been identified;
  
  identify a region of the item of audio content that is preliminarily aligned to text of the uncertain region;
  
  apply a language model including text of the uncertain region to the region of the item of audio content to generate a textual transcription of the region of the item of audio content;
  
  determine that the text of the uncertain region substantially corresponds to the text of the textual transcription;
  
  determine that the text of the uncertain region substantially correspond to the text of the textual transcription; and
  
  generate content synchronization information, wherein the content synchronization information enables synchronous presentation of the uncertain region in the item of textual content and the region of the item of audio content.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The non-transitory computer-readable medium of claim 14, wherein the computer-executable module is further configured to:
    - identify a second uncertain region in the item of textual content;
      
      identify a second region of the item of audio content that is preliminarily aligned to text of the second uncertain region;
      
      generate a textual transcription of the second region of the item of audio content;
      
      determine that the text of the second uncertain region does not substantially correspond to the text of the textual transcription of the second region of the item of audio content,convert the text of the second uncertain region to a first phoneme string;
      
      convert the text of the textual transcription of the second region of the item of audio content to a second phoneme string;
      
      determine that the first phoneme string substantially corresponds to the second phoneme string; and
      
      generate second content synchronization information wherein the second content synchronization information facilitates the synchronous presentation of the second uncertain region and the second region of the item of audio content.
  - 16. The non-transitory computer-readable medium of claim 15, wherein the first phoneme string substantially corresponds to the second phoneme string if the first phoneme string is within a threshold Levenshtein distance from the second phoneme string.
  - 17. The non-transitory computer-readable medium of claim 15, wherein:
    - the computer-executable module is further configured to generate an acoustically confusable hypothesis for the first phoneme string; and
      
      the first phoneme string substantially corresponds to the second phoneme string if the acoustically confusable hypothesis for the first phoneme string is at least substantially similar to the second phoneme string.
  - 18. The non-transitory computer-readable medium of claim 15, wherein:
    - the computer-executable module is further configured to generate an acoustically confusable hypothesis for the second phoneme string; and
      
      the first phoneme string substantially corresponds to the second phoneme string if the acoustically confusable hypothesis for the second phoneme string is at least substantially similar to the first phoneme string.
  - 19. The non-transitory computer-readable medium of claim 15, wherein:
    - the computer-executable module is further configured to generate a first acoustically confusable hypothesis for the first phoneme string and a second acoustically confusable hypothesis for the second phoneme string; and
      
      the first phoneme string substantially corresponds to the second phoneme string if the first acoustically confusable hypothesis is at least substantially similar to the second acoustically confusable hypothesis.

20. A computer-implemented method comprising:
- as implemented by one or more computing devices configured with specific computer-executable instructions,identifying an uncertain region in an item of textual content, wherein the uncertain region includes text of the item of textual content for which corresponding audio in an item of audio content has not yet been identified;
  
  identifying a region of the item of audio content that is preliminarily aligned to the uncertain region in the item of textual content;
  
  applying a language model including the text of the uncertain region to the region of the item of audio content to generate a textual transcription of the region of the item of audio content;
  
  identifying a significant corresponding word included in both the textual transcription and the text of item of textual content included in the uncertain region; and
  
  generating content synchronization information enabling the synchronous presentation of the significant corresponding word in both the item of textual content and the item of audio content.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The computer-implemented method of claim 20, wherein the uncertain region comprises at least a threshold number of words.
  - 22. The computer-implemented method of claim 20, wherein the significant corresponding word has a word score that satisfies a threshold.
  - 23. The computer-implemented method of claim 22, wherein the word score of the significant corresponding word is based at least in part on at least one of:
    - a number of letters included in the significant corresponding word;
      
      a frequency of one or more letters included in the significant corresponding word; and
      
      a number of syllables included in the significant corresponding word.
  - 24. The computer-implemented method of claim 20 further comprising:
    - identifying a substantial acoustic similarity between a first word string of the item of textual content and a second word string of the textual transcription; and
      
      identifying a subregion of the item of audio content that corresponds to the first word string;
      
      wherein the first word string occurs substantially within the uncertain region in the item of textual content; and
      
      wherein the content synchronization information further facilitates the synchronous presentation of the first word string in the item of textual content and the corresponding subregion of the item of audio content.
  - 25. The computer-implemented method of claim 24, wherein neither the first word string nor the second word string include the significant corresponding word.
  - 26. The computer-implemented method of claim 24, wherein the first word string comprises at least a threshold number of words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Audible Incorporated (Amazon.com, Inc.)
Original Assignee
Audible Incorporated (Amazon.com, Inc.)
Inventors
Dzik, Steven C., Story, Guy A. Jr.
Primary Examiner(s)
Azad, Abul

Application Number

US13/604,482
Publication Number

US 20140039887A1
Time in Patent Office

1,063 Days
Field of Search

704/235
US Class Current

1/1
CPC Class Codes

G06F 16/4393   Multimedia presentations, e...

G10L 15/183   using context dependencies,...

G10L 15/26   Speech to text systems G10L...

H05K 999/99   dummy group

Identifying corresponding regions of content

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

109 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying corresponding regions of content

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

109 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links