Generating a probability of music using machine learning technology

US 10,296,638 B1
Filed: 12/12/2017
Issued: 05/21/2019
Est. Priority Date: 08/31/2017
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

capturing, by a computing device, a plurality of segments of an audio stream;

for each segment of the plurality of segments of the audio stream;

performing, by the computing device, feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment;

generating, by the computing device, a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment;

generating, by the computing device, a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments;

determining, by the computing device, that the probability value that there is music in the audio stream meets a predetermined threshold; and

causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods provide for capturing a plurality of segments of an audio stream and, for each segment of the plurality of segments of the audio stream: performing feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment and generating a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model. The systems and methods further provide for generating a probability value that there is music in the audio stream based on the prediction value for each of the plurality of segments and causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.

31 Citations

View as Search Results

20 Claims

1. A method comprising:
- capturing, by a computing device, a plurality of segments of an audio stream;
  
  for each segment of the plurality of segments of the audio stream;
  
  performing, by the computing device, feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment;
  
  generating, by the computing device, a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment;
  
  generating, by the computing device, a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments;
  
  determining, by the computing device, that the probability value that there is music in the audio stream meets a predetermined threshold; and
  
  causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the audio stream has a first sampling rate, and wherein after capturing the plurality of segments of the audio stream, the method further comprises:
    - resampling the plurality of audio segments of the audio stream to a second sampling rate.
  - 3. The method of claim 2, wherein the plurality of segments of the audio stream is down sampled from the first sampling rate to the second sampling rate.
  - 4. The method of claim 1, further comprising:
    - determining a first sampling rate of the audio stream;
      
      determining that the first sampling rate of the audio stream is different than a predetermined sampling rate; and
      
      resampling the audio stream from the first sampling rate to the predetermined sampling rate.
  - 5. The method of claim 1, wherein the feature vector is a two dimensional feature vector, wherein a first dimension of the two dimensional feature vector is a time-domain for the segment, and a second dimension of the two dimensional feature vector is a feature-domain for the segment.
  - 6. The method of claim 5, wherein the feature-domain for the segment is a frequency domain.
  - 7. The method of claim 1, wherein the music detection machine learning model is trained using a plurality of messages comprising media content.
  - 8. The method of claim 1, wherein the plurality of segments of the audio stream are captured from media content of a message in a messaging system.
  - 9. The method of claim 8, further comprising:
    - setting a flag associated with the message indicating that the audio stream has already been processed.
  - 10. The method of claim 1, wherein each segment comprises a slide-window that overlaps in time with another segment.
  - 11. The method of claim 10, wherein each slide-window comprises a predetermined stride size corresponding to an amount of time between the start of a first segment and the start of a next segment following the first segment.
  - 12. The method of claim 1, wherein causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meeting a predetermined threshold comprises:
    - sending a request to a server computing device to request that the audio stream be identified; and
      
      receiving a response, from the server computing device, that includes identity information for the audio stream.
  - 13. The method of claim 1, further comprising:
    - scanning media content stored on the computing device;
      
      identifying media content that has had an associated audio stream processed; and
      
      setting a flag for each identified media content indicating that an audio stream associated with the media content has been processed.

14. A computing device comprising:
- a processor; and
  
  a computer readable medium coupled with the processor, the computer readable medium comprising instructions stored thereon that are executable by the processor to cause a computing device to perform operations comprising;
  
  capturing a plurality of segments of an audio stream;
  
  for each segment of the plurality of segments of the audio stream;
  
  performing feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment;
  
  generating a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment;
  
  generating a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments;
  
  determining that the probability value that there is music in the audio stream meets a predetermined threshold; and
  
  causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computing device of claim 14, wherein the feature vector is a two dimensional feature vector, wherein a first dimension of the two dimensional feature vector is a time-domain for the segment, and a second dimension of the two dimensional feature vector is a feature-domain for the segment.
  - 16. The computing device of claim 15, wherein the feature-domain for the segment is a frequency domain.
  - 17. The computing device of claim 14, wherein the plurality of segments of the audio stream are captured from media content of a message in a messaging system.
  - 18. The computing device of claim 14, wherein each segment comprises a slide-window that overlaps in time with another segment and wherein each slide-window comprises a predetermined stride size corresponding to an amount of time between the start of a first segment and the start of a next segment following the first segment.
  - 19. The computing device of claim 14, wherein causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meeting a predetermined threshold comprises:
    - sending a request to a server computing device to request that the audio stream be identified; and
      
      receiving a response, from the server computing device, that includes identity information for the audio stream.

20. A non-transitory computer readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations comprising:
- capturing a plurality of segments of an audio stream;
  
  for each segment of the plurality of segments of the audio stream;
  
  performing feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment;
  
  generating a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment;
  
  generating a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments;
  
  determining that the probability value that there is music in the audio stream meets a predetermined threshold; and
  
  causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Snap, Inc.
Original Assignee
Snap, Inc.
Inventors
Chen, Xin, Chung, Jaewook, Hu, Yu, Jiang, Jinhua, Mei, Xing, Ouimet, Kirk, Xu, Ning
Primary Examiner(s)
Ramakrishnaiah, Melur

Application Number

US15/839,454
Time in Patent Office

525 Days
Field of Search

700 94, 381 56, 381 941, 381 942, 381 943, 704200, 704203, 704208, 704219, 704256
US Class Current
CPC Class Codes

G06F 16/433   using audio data

G06F 16/632   Query formulation

G06F 16/68   Retrieval characterised by ...

G06F 16/683   using metadata automaticall...

G10H 2240/141   Library retrieval matching,...

Generating a probability of music using machine learning technology

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Generating a probability of music using machine learning technology

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others