Unified recognition of speech and music
First Claim
1. A method for providing information to a user, the method comprising:
- detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream;
analyzing, by a processor of the computing device, one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes;
first checking the one or more segments to determine if the audio stream includes speech; and
second checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed;
determining a first confidence score from the first checking and determining a second confidence score from the second checking;
displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and
presenting results on the display based on the completed identification of the audio stream.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and computer programs are presented for unified recognition of speech and music. One method includes an operation for starting an audio recognition mode by a computing device while receiving an audio stream. Segments of the audio stream are analyzed as the audio stream is received, where the analysis includes simultaneous checking for speech and music. Further, the method includes an operation for determining a first confidence score for speech and a second confidence score for music. As the audio stream is received, additional segments are analyzed until the end of the audio stream or until the first and second confidence scores indicate that the audio stream has been identified as speech or music. Further, results are presented on a display based on the identification of the audio stream, including text entered if the audio stream was speech or song information if the audio stream was music.
16 Citations
20 Claims
-
1. A method for providing information to a user, the method comprising:
-
detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream; analyzing, by a processor of the computing device, one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes; first checking the one or more segments to determine if the audio stream includes speech; and second checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed; determining a first confidence score from the first checking and determining a second confidence score from the second checking; displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and presenting results on the display based on the completed identification of the audio stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A device for providing information to a user, the device comprising:
-
a microphone; a display; a processor; and a memory including a computer program for audio recognition, wherein instructions of the computer program when executed by the processor perform operations for; detecting entry in an audio recognition mode, the detecting including receiving an audio stream via the microphone; analyzing one or more segments of the audio stream before a complete audio stream is received, wherein analyzing includes; sending the one or more segments to a first server for determining if the audio stream includes speech; and sending the one or more segments to a second server for determining if the audio stream is from a song; receiving a first confidence score from the first server and receiving a second confidence score from the second server; displaying a possible candidate on the display based on a partial identification of the audio stream using the first and second confidence scores while continuing analyzing additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and presenting results on the display based on the completed identification of the audio stream. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer program embedded in a non-transitory computer-readable storage medium, when executed by one or more processors, for providing information to a user, the computer program comprising:
-
program instructions for detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream; program instructions for analyzing one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes; first checking the one or more segments to determine if the audio stream includes speech; and second checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed; program instructions for determining a first confidence score from the first checking and determining a second confidence score from the second checking; program instructions for displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and program instructions for presenting results on a display based on the completed identification of the audio stream. - View Dependent Claims (17, 18, 19, 20)
-
Specification