Systems and Methods for Aligning Lyrics using a Neural Network

US 20200135176A1
Filed: 09/12/2019
Published: 04/30/2020
Est. Priority Date: 10/29/2018
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

at an electronic device having one or more processors and memory storing instructions for execution by the one or more processors;

receiving audio data for a media item;

generating, from the audio data, a plurality of samples, each sample having a predefined maximum length;

using a neural network trained to predict character probabilities, generating a probability matrix of characters for a first portion of a first sample of the plurality of samples, wherein the probability matrix includes;

character information,timing information, andrespective probabilities of respective characters at respective times;

identifying, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic device receives audio data for a media item. The electronic device generates, from the audio data, a plurality of samples, each sample having a predefined maximum length. The electronic device, using a neural network trained to predict character probabilities, generates a probability matrix of characters for a first portion of a first sample of the plurality of samples. The probability matrix includes character information, timing information, and respective probabilities of respective characters at respective times. The electronic device identifies, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.

2 Citations

13 Claims

1. A method, comprising:
- at an electronic device having one or more processors and memory storing instructions for execution by the one or more processors;
  
  receiving audio data for a media item;
  
  generating, from the audio data, a plurality of samples, each sample having a predefined maximum length;
  
  using a neural network trained to predict character probabilities, generating a probability matrix of characters for a first portion of a first sample of the plurality of samples, wherein the probability matrix includes;
  
  character information,timing information, andrespective probabilities of respective characters at respective times;
  
  identifying, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein generating the matrix using the neural network comprises:
    - convolving the first sample;
      
      downsampling the first sample to reduce a dimension of the first sample; and
      
      after downsampling the first sample, upsampling the first sample to increase the dimension of the first sample.
  - 3. The method of claim 1, wherein identifying the first sequence of characters includes:
    - receiving, from an external source, lyrics corresponding to the media item; and
      
      using the received lyrics and the probability matrix, aligning characters in the first sequence of characters with the received lyrics corresponding to the media item.
  - 4. The method of claim 1, further comprising:
    - determining a set of lyrics based on the first sequence of characters; and
      
      storing the set of lyrics in association with the media item.
  - 5. The method of claim 1, further comprising:
    - using a language model and at least a portion of the first sequence of characters, determine a first word in the first portion of the first sample; and
      
      determining, using the timing information that corresponds to the first portion of the first sample, a time that corresponds to the first word.
  - 6. The method of claim 1, further comprising generating a plurality of probability matrices for a plurality of samples using the neural network;
    - andconcatenating a set of two or more of the generated probability matrices to create a single probability matrix, the single probability matrix including;
      
      character information,timing information, andrespective probabilities of respective characters at respective times.
  - 7. The method of claim 1, wherein the received audio data includes an extracted vocal track that has been separated from a media content item.
  - 8. The method of claim 1, wherein the received audio data is a polyphonic media content item.
  - 9. The method of claim 1, further comprising:
    - receiving, from a user, a request to search for a second sequence of characters within the media item;
      
      in response to receiving the request to search for the sequence of characters, performing a search of the first sequence of characters to determine whether at least a portion of the first sequence of characters matches the second sequence of characters; and
      
      in accordance with a determination that at least a portion of the first sequence of characters matches the second sequence of characters, identifying timing information related to the portion that matches.
  - 10. The method of claim 1, further comprising:
    - identifying, from the first sequence of characters, one or more keywords associated with the media item.
  - 11. The method of claim 10, further comprising:
    - determining whether any of the one or more keywords corresponds to a defined set of words; and
      
      in accordance with a determination that a first keyword of the one or more keywords corresponds to the defined set of words, performing an operation on a portion of the sample that corresponds to the first keyword.

12. A first electronic device comprising:
- one or more processors; and
  
  memory storing instructions for execution by the one or more processors, the instructions including instructions for;
  
  receiving audio data for a media item;
  
  generating, from the audio data, a plurality of samples, each sample having a predefined maximum length;
  
  using a neural network trained to predict character probabilities, generating a probability matrix of characters for a first portion of a first sample of the plurality of samples, wherein the probability matrix includes;
  
  character information,timing information, andrespective probabilities of respective characters at respective times;
  
  identifying, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.

13. A non-transitory computer-readable storage medium storing instructions that, when executed by an electronic device, cause the electronic device to:
- receive audio data for a media item;
  
  generate, from the audio data, a plurality of samples, each sample having a predefined maximum length;
  
  using a neural network trained to predict character probabilities, generate a probability matrix of characters for a first portion of a first sample of the plurality of samples, wherein the probability matrix includes;
  
  character information,timing information, andrespective probabilities of respective characters at respective times;
  
  identify, for the first portion of the first sample, a first sequence of characters based on the generated probability matrix.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Spotify AB (Spotify Technology SA)
Original Assignee
Spotify AB (Spotify Technology SA)
Inventors
Stoller, Daniel, Durand, Simon Rene Georges, Ewert, Sebastian

Granted Patent

US 11,308,943 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/63   Querying

G06F 17/16   Matrix or vector computatio...

G06F 17/18   for evaluating statistical ...

G06N 3/08   Learning methods

G10L 15/16   using artificial neural net...

G10L 15/183   using context dependencies,...

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

G10L 2015/226   using non-speech characteri...

Systems and Methods for Aligning Lyrics using a Neural Network

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

2 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and Methods for Aligning Lyrics using a Neural Network

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

2 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links