Integration of embedded and network speech recognizers

US 8,868,428 B2
Filed: 08/14/2012
Issued: 10/21/2014
Est. Priority Date: 01/26/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving audio data corresponding to a user utterance;

providing the audio data to a remote speech recognizer and, in response, obtaining several search results that are identified as a result of a search of a remote database using at least a portion of a remotely-generated transcription of the user utterance as a query, the remotely-generated transcription of the user utterance being generated by the remote speech recognizer;

obtaining a locally-generated transcription of the user utterance and a speech recognition confidence value associated with the locally-generated transcription of the user utterance, the locally-generated transcription of the user utterance and the confidence value being generated by a local speech recognizer;

in response to determining that the speech recognition confidence value generated by the local speech recognizer fails to meet a threshold value;

bypassing performing a search of a local database using at least a portion of the locally-generated transcription as a query,providing one or more of the search results that are identified as a result of the search of the remote database for output,receiving data indicative of a selection of a particular search result from among the provided search results that are identified as a result of the search of the remote database, andstoring the locally-generated transcription of the user utterance, generated by the local speech recognizer, in association with data identifying the particular search result;

receiving, after storing the locally-generated transcription of the user utterance, second audio data corresponding to a second user utterance;

obtaining a locally-generated transcription of the second user utterance, the locally-generated transcription of the second user utterance being generated by the local speech recognizer;

determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance;

providing the second audio data to the remote speech recognizer after determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance and, in response, obtaining several additional search results that are identified as a result of a search of the remote database using at least a portion of a remotely-generated transcription of the second user utterance as a query, the remotely-generated transcription of the second user utterance being generated by the remote speech recognizer; and

providing the particular search result and the several additional search results that are identified as a result of the search of the remote database for output, based on the data identifying the particular search result that is stored in association with the locally-generated transcription of the user utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer program product, and system are provided for performing a voice command on a client device. The method can include translating, using a first speech recognizer located on the client device, an audio stream of a voice command to a first machine-readable voice command and generating a first query result using the first machine-readable voice command to query a client database. In addition, the audio stream can be transmitted to a remote server device that translates the audio stream to a second machine-readable voice command using a second speech recognizer. Further, the method can include receiving a second query result from the remote server device, where the second query result is generated by the remote server device using the second machine-readable voice command and displaying the first query result and the second query result on the client device.

Citations

12 Claims

1. A computer-implemented method comprising:
- receiving audio data corresponding to a user utterance;
  
  providing the audio data to a remote speech recognizer and, in response, obtaining several search results that are identified as a result of a search of a remote database using at least a portion of a remotely-generated transcription of the user utterance as a query, the remotely-generated transcription of the user utterance being generated by the remote speech recognizer;
  
  obtaining a locally-generated transcription of the user utterance and a speech recognition confidence value associated with the locally-generated transcription of the user utterance, the locally-generated transcription of the user utterance and the confidence value being generated by a local speech recognizer;
  
  in response to determining that the speech recognition confidence value generated by the local speech recognizer fails to meet a threshold value;
  
  bypassing performing a search of a local database using at least a portion of the locally-generated transcription as a query,providing one or more of the search results that are identified as a result of the search of the remote database for output,receiving data indicative of a selection of a particular search result from among the provided search results that are identified as a result of the search of the remote database, andstoring the locally-generated transcription of the user utterance, generated by the local speech recognizer, in association with data identifying the particular search result;
  
  receiving, after storing the locally-generated transcription of the user utterance, second audio data corresponding to a second user utterance;
  
  obtaining a locally-generated transcription of the second user utterance, the locally-generated transcription of the second user utterance being generated by the local speech recognizer;
  
  determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance;
  
  providing the second audio data to the remote speech recognizer after determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance and, in response, obtaining several additional search results that are identified as a result of a search of the remote database using at least a portion of a remotely-generated transcription of the second user utterance as a query, the remotely-generated transcription of the second user utterance being generated by the remote speech recognizer; and
  
  providing the particular search result and the several additional search results that are identified as a result of the search of the remote database for output, based on the data identifying the particular search result that is stored in association with the locally-generated transcription of the user utterance.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further comprising:
    - obtaining, from the remote speech recognizer, several candidate transcriptions of the user utterance;
      
      providing one or more of the several candidate transcriptions for output; and
      
      receiving data indicative of a selection of a particular transcription from among the one or more candidate transcriptions,wherein the several search results that are identified as a result of the search of the remote database are identified as a result of a search of the remote database using at least a portion of the particular transcription.
  - 3. The method of claim 1, wherein the data identifying the particular search result includes a universal resource locator (URL).
  - 4. The method of claim 1, wherein the data identifying the particular search result includes a web page.

5. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving audio data corresponding to a user utterance;
  
  providing the audio data to a remote speech recognizer and, in response, obtaining several search results that are identified as a result of a search of a remote database using at least a portion of a remotely-generated transcription of the user utterance as a query, the remotely-generated transcription of the user utterance being generated by the remote speech recognizer;
  
  obtaining a locally-generated transcription of the user utterance and a speech recognition confidence value associated with the locally-generated transcription of the user utterance, the locally-generated transcription of the user utterance and the confidence value being generated by a local speech recognizer;
  
  in response to determining that the speech recognition confidence value generated by the local speech recognizer fails to meet a threshold value;
  
  bypassing performing a search of a local database using at least a portion of the locally-generated transcription as a query,providing one or more of the search results that are identified as a result of the search of the remote database for output,receiving data indicative of a selection of a particular search result from among the provided search results that are identified as a result of the search of the remote database, andstoring the locally-generated transcription of the user utterance, generated by the local speech recognizer, in association with data identifying the particular search result;
  
  receiving, after storing the locally-generated transcription of the user utterance, second audio data corresponding to a second user utterance;
  
  obtaining a locally-generated transcription of the second user utterance, the locally-generated transcription of the second user utterance being generated by the local speech recognizer;
  
  determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance;
  
  providing the second audio data to the remote speech recognizer after determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance and, in response, obtaining several additional search results that are identified as a result of a search of the remote database using at least a portion of a remotely-generated transcription of the second user utterance as a query, the remotely-generated transcription of the second user utterance being generated by the remote speech recognizer; and
  
  providing the particular search result and the several additional search results that are identified as a result of the search of the remote database for output, based on the data identifying the particular search result that is stored in association with the locally-generated transcription of the user utterance.
- View Dependent Claims (6, 7, 8)
- - 6. The system of claim 5, wherein the operations further comprise:
    - obtaining, from the remote speech recognizer, several candidate transcriptions of the user utterance;
      
      providing one or more of the several candidate transcriptions for output; and
      
      receiving data indicative of a selection of a particular transcription from among the one or more candidate transcriptions,wherein the several search results that are identified as a result of the search of the remote database are identified as a result of a search of the remote database using at least a portion of the particular transcription.
  - 7. The system of claim 5, wherein the data identifying the particular search result includes a universal resource locator (URL).
  - 8. The system of claim 5, wherein the data identifying the particular search result includes a web page.

9. A computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving audio data corresponding to a user utterance;
  
  providing the audio data to a remote speech recognizer and, in response, obtaining several search results that are identified as a result of a search of a remote database using at least a portion of a remotely-generated transcription of the user utterance as a query, the remotely-generated transcription of the user utterance being generated by the remote speech recognizer;
  
  obtaining a locally-generated transcription of the user utterance and a speech recognition confidence value associated with the locally-generated transcription of the user utterance, the locally-generated transcription of the user utterance and the confidence value being generated by a local speech recognizer;
  
  in response to determining that the speech recognition confidence value generated by the local speech recognizer fails to meet a threshold value;
  
  bypassing performing a search of a local database using at least a portion of the locally-generated transcription as a query,providing one or more of the search results that are identified as a result of the search of the remote database for output,receiving data indicative of a selection of a particular search result from among the provided search results that are identified as a result of the search of the remote database, andstoring the locally-generated transcription of the user utterance, generated by the local speech recognizer, in association with data identifying the particular search result;
  
  receiving, after storing the locally-generated transcription of the user utterance, second audio data corresponding to a second user utterance;
  
  obtaining a locally-generated transcription of the second user utterance, the locally-generated transcription of the second user utterance being generated by the local speech recognizer;
  
  determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance;
  
  providing the second audio data to the remote speech recognizer after determining that the locally-generated transcription of the second user utterance matches the stored locally-generated transcription of the user utterance and, in response, obtaining several additional search results that are identified as a result of a search of the remote database using at least a portion of a remotely-generated transcription of the second user utterance as a query, the remotely-generated transcription of the second user utterance being generated by the remote speech recognizer; and
  
  providing the particular search result and the several additional search results that are identified as a result of the search of the remote database for output, based on the data identifying the particular search result that is stored in association with the locally-generated transcription of the user utterance.
- View Dependent Claims (10, 11, 12)
- - 10. The device of claim 9, wherein the operations further comprise:
    - obtaining, from the remote speech recognizer, several candidate transcriptions of the user utterance;
      
      providing one or more of the several candidate transcriptions for output; and
      
      receiving data indicative of a selection of a particular transcription from among the one or more candidate transcriptions,wherein the several search results that are identified as a result of the search of the remote database are identified as a result of a search of the remote database using at least a portion of the particular transcription.
  - 11. The device of claim 9, wherein the data identifying the particular search result includes a universal resource locator (URL).
  - 12. The device of claim 9, wherein the data identifying the particular search result includes a web page.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Gruenstein, Alexander, Byrne, William J.
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US13/585,280
Publication Number

US 20120310645A1
Time in Patent Office

798 Days
Field of Search

704/275, 704/257, 704/254, 704/231, 704/270, 704/251, 704/236, 704/260, 704/235, 704/10, 725/39, 386/231, 715/800, 709223-229
US Class Current

704/275
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 15/32 Multiple recognisers used i...

Integration of embedded and network speech recognizers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Integration of embedded and network speech recognizers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links