Unified recognition of speech and music

US 9,224,385 B1
Filed: 06/17/2013
Issued: 12/29/2015
Est. Priority Date: 06/17/2013
Status: Active Grant

First Claim

Patent Images

1. A method for providing information to a user, the method comprising:

detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream;

analyzing, by a processor of the computing device, one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes;

first checking the one or more segments to determine if the audio stream includes speech; and

second checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed;

determining a first confidence score from the first checking and determining a second confidence score from the second checking;

displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and

presenting results on the display based on the completed identification of the audio stream.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and computer programs are presented for unified recognition of speech and music. One method includes an operation for starting an audio recognition mode by a computing device while receiving an audio stream. Segments of the audio stream are analyzed as the audio stream is received, where the analysis includes simultaneous checking for speech and music. Further, the method includes an operation for determining a first confidence score for speech and a second confidence score for music. As the audio stream is received, additional segments are analyzed until the end of the audio stream or until the first and second confidence scores indicate that the audio stream has been identified as speech or music. Further, results are presented on a display based on the identification of the audio stream, including text entered if the audio stream was speech or song information if the audio stream was music.

16 Citations

View as Search Results

20 Claims

1. A method for providing information to a user, the method comprising:
- detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream;
  
  analyzing, by a processor of the computing device, one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes;
  
  first checking the one or more segments to determine if the audio stream includes speech; and
  
  second checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed;
  
  determining a first confidence score from the first checking and determining a second confidence score from the second checking;
  
  displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and
  
  presenting results on the display based on the completed identification of the audio stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method as recited in claim 1, wherein first checking further includes sending the one or more segments to a speech recognition server, and wherein second checking further includes sending the one or more segments to a music recognition server.
  - 3. The method as recited in claim 1, wherein determining the first confidence score further includes receiving the first confidence score from a speech recognition server, and wherein determining the second confidence score further includes receiving the second confidence score from a music recognition server.
  - 4. The method as recited in claim 1, further including:
    - presenting intermediate results, on the display, while the audio stream is being received based on one or more first confidence scores and one or more second confidence scores, the intermediate results including one or more possible results.
  - 5. The method as recited in claim 4, wherein the audio stream is identified as speech, wherein the intermediate results include recognized words received in the audio stream.
  - 6. The method as recited in claim 4, wherein the audio stream is identified as music, wherein the intermediate results include candidate result songs for the audio stream.
  - 7. The method as recited in claim 1, further including:
    - after the detecting the entry in the audio recognition mode, providing, on the display, a unified interface for recognizing speech and music; and
      
      providing an indicator in the unified interface on the display that the audio stream is being received.
  - 8. The method as recited in claim 1, further including:
    - performing, by the computing device, a search when the audio stream has been identified as speech, the search being performed on the identified speech.
  - 9. The method as recited in claim 1, further including:
    - displaying song information on the display when the audio stream has been identified as music, the song information corresponding to an identified song.
  - 10. The method as recited in claim 9, further including:
    - displaying, on the display, lyrics in synchronism with the identified song.

11. A device for providing information to a user, the device comprising:
- a microphone;
  
  a display;
  
  a processor; and
  
  a memory including a computer program for audio recognition, wherein instructions of the computer program when executed by the processor perform operations for;
  
  detecting entry in an audio recognition mode, the detecting including receiving an audio stream via the microphone;
  
  analyzing one or more segments of the audio stream before a complete audio stream is received, wherein analyzing includes;
  
  sending the one or more segments to a first server for determining if the audio stream includes speech; and
  
  sending the one or more segments to a second server for determining if the audio stream is from a song;
  
  receiving a first confidence score from the first server and receiving a second confidence score from the second server;
  
  displaying a possible candidate on the display based on a partial identification of the audio stream using the first and second confidence scores while continuing analyzing additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and
  
  presenting results on the display based on the completed identification of the audio stream.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The device as recited in claim 11, wherein the device is one of a phone, a tablet, a personal computer, a book reader, or a laptop.
  - 13. The device as recited in claim 11, wherein sending the one or more segments to a first server and sending the one or more segments to a second server are performed in parallel and independently from each other.
  - 14. The device as recited in claim 11, wherein the processor further performs operations of:
    - providing a unified interface in the display for recognizing speech and music; and
      
      providing an indicator in the unified interface that the audio stream is being received.
  - 15. The device as recited in claim 11, wherein the processor further performs an operation for presenting intermediate results while the audio stream is being received based on one or more first confidence scores and one or more second confidence scores.

16. A computer program embedded in a non-transitory computer-readable storage medium, when executed by one or more processors, for providing information to a user, the computer program comprising:
- program instructions for detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream;
  
  program instructions for analyzing one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes;
  
  first checking the one or more segments to determine if the audio stream includes speech; and
  
  second checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed;
  
  program instructions for determining a first confidence score from the first checking and determining a second confidence score from the second checking;
  
  program instructions for displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; and
  
  program instructions for presenting results on a display based on the completed identification of the audio stream.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer program as recited in claim 16, further including:
    - program instructions for compressing the one or more segments to produce a compressed audio segment to be sent to one or more servers for audio recognition.
  - 18. The computer program as recited in claim 16, wherein first checking further includes sending the one or more segments to a speech recognition server, and wherein second checking further includes sending the one or more segments to a music recognition server.
  - 19. The computer program as recited in claim 16, further including:
    - program instructions for comparing the first confidence score against a first threshold; and
      
      program instructions for determining that the audio segment is speech when the first confidence score is higher than the first threshold.
  - 20. The computer program as recited in claim 16, further including:
    - program instructions for comparing the second confidence score against a second threshold; and
      
      program instructions for determining that the audio segment is music when the second confidence score is higher than the second threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sharifi, Matthew, Shahshahani, Ben, Roblek, Dominik
Primary Examiner(s)
Vo, Huyen

Application Number

US13/919,170
Time in Patent Office

925 Days
Field of Search

704/236, 704/231, 704/255, 704/257, 704/270, 704/270.1, 704 1- 10, 704/251, 704/272, 704/208, 704/210, 704/214, 704/215, 700/94, 709/238
US Class Current

1/1
CPC Class Codes

G10H 2210/046   for differentiation between...

G10H 2220/011   Lyrics displays, e.g. for k...

G10H 2240/141   Library retrieval matching,...

G10L 15/26   Speech to text systems G10L...

G10L 21/10   Transforming into visible i...

G10L 25/51   for comparison or discrimin...

Unified recognition of speech and music

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Unified recognition of speech and music

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links