INTEGRATION OF EMBEDDED AND NETWORK SPEECH RECOGNIZERS

US 20120310645A1
Filed: 08/14/2012
Published: 12/06/2012
Est. Priority Date: 01/26/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving a first audio data corresponding to a first user utterance;

obtaining, by a first speech recognizer, a transcription of the first user utterance and a speech recognition confidence value associated with the transcription of the first user utterance;

based on determining that the speech recognition confidence value fails to meet a threshold value, transmitting the first audio data to a server-based speech recognizer;

receiving, from a server, several search results associated with a second transcription of the first audio data, the second transcription of the first audio data being generated by the server-based speech recognizer;

presenting one or more of the search results to a user;

receiving a user selection of a particular search result from among the several search results; and

storing the transcription of the first user utterance in association with the data identifying the particular search result.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer program product, and system are provided for performing a voice command on a client device. The method can include translating, using a first speech recognizer located on the client device, an audio stream of a voice command to a first machine-readable voice command and generating a first query result using the first machine-readable voice command to query a client database. In addition, the audio stream can be transmitted to a remote server device that translates the audio stream to a second machine-readable voice command using a second speech recognizer. Further, the method can include receiving a second query result from the remote server device, where the second query result is generated by the remote server device using the second machine-readable voice command and displaying the first query result and the second query result on the client device.

Citations

18 Claims

1. A computer-implemented method comprising:
- receiving a first audio data corresponding to a first user utterance;
  
  obtaining, by a first speech recognizer, a transcription of the first user utterance and a speech recognition confidence value associated with the transcription of the first user utterance;
  
  based on determining that the speech recognition confidence value fails to meet a threshold value, transmitting the first audio data to a server-based speech recognizer;
  
  receiving, from a server, several search results associated with a second transcription of the first audio data, the second transcription of the first audio data being generated by the server-based speech recognizer;
  
  presenting one or more of the search results to a user;
  
  receiving a user selection of a particular search result from among the several search results; and
  
  storing the transcription of the first user utterance in association with the data identifying the particular search result.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - receiving, from the second speech recognizer, several candidate transcriptions of the first audio data;
      
      presenting the several candidate transcriptions to the user; and
      
      receiving a user selection of a particular transcription from among the one or more candidate transcriptions,wherein the several search results received from the server are associated with the particular transcription.
  - 3. The method of claim 1, further comprising:
    - receiving a second audio data corresponding to a second user utterance;
      
      generating, by the first speech recognizer, a transcription of the second user utterance;
      
      determining that the transcription of the second user utterance matches the stored transcription of the first user utterance; and
      
      presenting, to the user, the particular search result based on the data identifying the particular search result and stored in association with the transcription of the first user utterance.
  - 4. The method of claim 3, further comprising:
    - transmitting the second audio data to the second speech recognizerreceiving, from a server, several additional search results associated with a second transcription of the second audio data, the second transcription of the second audio data generated by the second speech recognizer; and
      
      presenting, to the user, the several additional search results along with the particular search result.
  - 5. The method of claim 1, wherein the data identifying the particular search result includes a universal resource locator (URL).
  - 6. The method of claim 1, wherein the data identifying the particular search result includes a web page.

7. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving a first audio data corresponding to a first user utterance;
  
  obtaining, by a first speech recognizer, a transcription of the first user utterance and a speech recognition confidence value associated with the transcription of the first user utterance;
  
  based on determining that the speech recognition confidence value fails to meet a threshold value, transmitting the first audio data to a server-based speech recognizer;
  
  receiving, from a server, several search results associated with a second transcription of the first audio data, the second transcription of the first audio data being generated by the server-based speech recognizer;
  
  presenting the several search results to a user;
  
  receiving a user selection of a particular search result from among the several search results; and
  
  storing the transcription of the first user utterance in association with the data identifying the particular search result.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the operations further comprise:
    - receiving, from the second speech recognizer, several candidate transcriptions of the first audio data;
      
      presenting the several candidate transcriptions to the user; and
      
      receiving a user selection of a particular transcription from among the one or more candidate transcriptions,wherein the several search results received from the server are associated with the particular transcription.
  - 9. The system of claim 7, wherein the operations further comprise:
    - receiving a second audio data corresponding to a second user utterance;
      
      generating, by the first speech recognizer, a transcription of the second user utterance;
      
      determining that the transcription of the second user utterance matches the stored transcription of the first user utterance; and
      
      presenting, to the user, the particular search result based on the data identifying the particular search result and stored in association with the transcription of the first user utterance.
  - 10. The system of claim 9, wherein the operations further comprise:
    - transmitting the second audio data to the second speech recognizerreceiving, from a server, several additional search results associated with a second transcription of the second audio data, the second transcription of the second audio data generated by the second speech recognizer; and
      
      presenting, to the user, the several additional search results along with the particular search result.
  - 11. The system of claim 7, wherein the data identifying the particular search result includes a universal resource locator (URL).
  - 12. The system of claim 7, wherein the data identifying the particular search result includes a web page.

13. A computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving a first audio data corresponding to a first user utterance;
  
  obtaining, by a first speech recognizer, a transcription of the first user utterance and a speech recognition confidence value associated with the transcription of the first user utterance;
  
  based on determining that the speech recognition confidence value fails to meet a threshold value, transmitting the first audio data to a server-based speech recognizer;
  
  receiving, from a server, several search results associated with a second transcription of the first audio data, the second transcription of the first audio data being generated by the server-based speech recognizer;
  
  presenting the several search results to a user;
  
  receiving a user selection of a particular search result from among the several search results; and
  
  storing the transcription of the first user utterance in association with the data identifying the particular search result.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The device of claim 13, where the operations further comprise:
    - receiving, from the second speech recognizer, several candidate transcriptions of the first audio data;
      
      presenting the several candidate transcriptions to the user; and
      
      receiving a user selection of a particular transcription from among the one or more candidate transcriptions,wherein the several search results received from the server are associated with the particular transcription.
  - 15. The device of claim 13, where the operations further comprise:
    - receiving a second audio data corresponding to a second user utterance;
      
      generating, by the first speech recognizer, a transcription of the second user utterance;
      
      determining that the transcription of the second user utterance matches the stored transcription of the first user utterance; and
      
      presenting, to the user, the particular search result based on the data identifying the particular search result and stored in association with the transcription of the first user utterance.
  - 16. The device of claim 15, where the operations further comprise:
    - transmitting the second audio data to the second speech recognizerreceiving, from a server, several additional search results associated with a second transcription of the second audio data, the second transcription of the second audio data generated by the second speech recognizer; and
      
      presenting, to the user, the several additional search results along with the particular search result.
  - 17. The device of claim 13, wherein the data identifying the particular search result includes a universal resource locator (URL).
  - 18. The device of claim 13, wherein the data identifying the particular search result includes a web page.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Gruenstein, Alexander, Byrne, William J.

Granted Patent

US 8,868,428 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 15/32 Multiple recognisers used i...

INTEGRATION OF EMBEDDED AND NETWORK SPEECH RECOGNIZERS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

INTEGRATION OF EMBEDDED AND NETWORK SPEECH RECOGNIZERS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links