Generating a probability of music using machine learning technology
First Claim
1. A method comprising:
- capturing, by a computing device, a plurality of segments of an audio stream;
for each segment of the plurality of segments of the audio stream;
performing, by the computing device, feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment;
generating, by the computing device, a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment;
generating, by the computing device, a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments;
determining, by the computing device, that the probability value that there is music in the audio stream meets a predetermined threshold; and
causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods provide for capturing a plurality of segments of an audio stream and, for each segment of the plurality of segments of the audio stream: performing feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment and generating a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model. The systems and methods further provide for generating a probability value that there is music in the audio stream based on the prediction value for each of the plurality of segments and causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.
31 Citations
20 Claims
-
1. A method comprising:
-
capturing, by a computing device, a plurality of segments of an audio stream; for each segment of the plurality of segments of the audio stream; performing, by the computing device, feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment; generating, by the computing device, a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment; generating, by the computing device, a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments; determining, by the computing device, that the probability value that there is music in the audio stream meets a predetermined threshold; and causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device comprising:
-
a processor; and a computer readable medium coupled with the processor, the computer readable medium comprising instructions stored thereon that are executable by the processor to cause a computing device to perform operations comprising; capturing a plurality of segments of an audio stream; for each segment of the plurality of segments of the audio stream; performing feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment; generating a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment; generating a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments; determining that the probability value that there is music in the audio stream meets a predetermined threshold; and causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A non-transitory computer readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations comprising:
-
capturing a plurality of segments of an audio stream; for each segment of the plurality of segments of the audio stream; performing feature extraction on an audio signal of the segment using a feature extraction machine learning model that analyzes the audio signal to generate a feature vector for the segment; generating a prediction value for the segment for whether there is music in the segment using the extracted feature vector and a music detector machine learning model that analyzes the feature vector for the segment; generating a probability value that there is music in the audio stream based on aggregating the prediction values of the plurality of segments; determining that the probability value that there is music in the audio stream meets a predetermined threshold; and causing the audio stream to be identified based on determining that the probability value that there is music in the audio stream meets a predetermined threshold.
-
Specification