Text synchronization with audio

US 9,305,530 B1
Filed: 09/30/2014
Issued: 04/05/2016
Est. Priority Date: 09/30/2014
Status: Active Grant

First Claim

Patent Images

1. A computing device that is configured to synchronize lyrics with music, comprising:

a processor;

a memory in electronic communication with the processor;

instructions stored in the memory, the instructions being executable by the processor to;

identify a marker for singing segments in the music where a person is singing using a machine learning model;

identify a marker for break segments in proximity to the singing segments where the person is not singing using the machine learning model;

identify lyric segments in lyrics associated with the music, the lyric segments being divided by lyric breaks;

synchronize one of the lyric breaks with a marker of one of the break segments; and

synchronize at least one of the lyric segments to a marker of one of the singing segments.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technology for synchronizing text with audio includes analyzing the audio to identify voice segments in the audio where a human voice is present and to identify non-voice segments in proximity to the voice segments. Segmented text associated with the audio, having text segments, may be identified and synchronized to the voice segments.

33 Citations

View as Search Results

18 Claims

1. A computing device that is configured to synchronize lyrics with music, comprising:
- a processor;
  
  a memory in electronic communication with the processor;
  
  instructions stored in the memory, the instructions being executable by the processor to;
  
  identify a marker for singing segments in the music where a person is singing using a machine learning model;
  
  identify a marker for break segments in proximity to the singing segments where the person is not singing using the machine learning model;
  
  identify lyric segments in lyrics associated with the music, the lyric segments being divided by lyric breaks;
  
  synchronize one of the lyric breaks with a marker of one of the break segments; and
  
  synchronize at least one of the lyric segments to a marker of one of the singing segments.
- View Dependent Claims (2, 3, 4)
- - 2. The computing device of claim 1, further configured to extract features from the music to identify the markers of the singing segments and break segments using the machine learning model.
  - 3. The computing device of claim 1, further configured to:
    - synchronize multiple lyric segments with one of the singing segments by dividing time duration of the singing segment by a number of the multiple lyric segments to derive singing sub-segments; and
      
      synchronize individual multiple lyric segments with individual singing sub-segments;
      
      wherein synchronizing the lyric segments with the singing segments or sub-segments is based on a machine learning synchronization model.
  - 4. The computing device of claim 1, further configured to synchronize an individual lyric segment with multiple singing segments upon identifying the singing segments outnumber the lyric segments.

5. A computer-implemented method, comprising:
- analyzing audio, using a processor, to extract features from the audio and identify voice segments in the audio where a human voice is present and to identify non-voice segments in proximity to the voice segments based on the extracted features;
  
  identifying segmented text associated with the audio, the segmented text having text segments;
  
  synchronizing the text segments to the voice segments using the processor; and
  
  soliciting group-sourced corrections to correct the synchronizing of the text segments to the voice segments.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 6. The method of claim 5, further comprising using machine learning to identify the voice segment by analyzing other classified audio of a same genre or including a similar voice.
  - 7. The method of claim 5, further comprising using machine learning to identify the voice segment by analyzing other audio by the human voice.
  - 8. The method of claim 5, further comprising analyzing the audio at predetermined intervals and classifying each interval based on whether the human voice is present.
  - 9. The method of claim 8, wherein the predetermined intervals are less than a second.
  - 10. The method of claim 8, wherein the predetermined intervals are milliseconds.
  - 11. The method of claim 5, wherein the segmented text includes subtitles for a video.
  - 12. The method of claim 5, wherein the segmented text is lyrics for a song.
  - 13. The method of claim 5, wherein the segmented text is text of a book and the audio is an audio narration of the book.
  - 14. The method of claim 5, further comprising identifying a break between multiple voice segments and associating a break between segments of the segmented text with the break between the multiple voice segments.
  - 15. The method of claim 14, wherein the multiple voice segments each include multiple words.
  - 16. The method of claim 14, wherein the multiple voice segments each include a single word and each segment of the segmented text includes a single word.

17. A non-transitory computer-readable medium comprising computer-executable instructions which, when executed by a processor, implement a system, comprising:
- an audio analysis module configured to analyze audio to identify a voice segment in the audio where a human voice is present;
  
  a text analysis module configured to identify segments in text associated with the audio and identify the voice segment as trained using other audio;
  
  a correlation module configured to determine a number of the segments of the text to associate with the voice segment; and
  
  a synchronization module to associate the number of the segments of the text with the voice segment.
- View Dependent Claims (18)
- - 18. The computer-readable medium of claim 17, wherein machine learning module uses a support vector machine learning algorithm to learn to identify the voice segment based on the other audio.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Durham, Brandon Scott, Malek, Darren Levi, Latin-Stoermer, Toby Ray, Hall, Jason Christopher, Mishra, Abhishek
Primary Examiner(s)
Donels, Jeffrey

Application Number

US14/503,073
Time in Patent Office

553 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10H 1/0008   Associated control or indic...

G10H 1/361   Recording/reproducing of ac...

G10H 2210/041   based on mfcc [mel -frequen...

G10H 2210/056   for extraction or identific...

G10H 2220/011   Lyrics displays, e.g. for k...

G10H 2240/325   Synchronizing two or more a...

G10L 25/27   characterised by the analys...

G10L 25/45   characterised by the type o...

G10L 25/51   for comparison or discrimin...

G10L 25/78   Detection of presence or ab...

G10L 25/81   for discriminating voice fr...

G10L 25/87   Detection of discrete point...

Text synchronization with audio

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

33 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Text synchronization with audio

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links