Integration of embedded and network speech recognizers

US 8,412,532 B2
Filed: 11/02/2011
Issued: 04/02/2013
Est. Priority Date: 01/26/2010
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving, at a client device, an audio stream that defines a voice command;

defining, using a first speech recognizer module stored at the client device, a first machine-readable voice command based at least in part on the audio stream;

receiving a first query result responsive to a first query sent to a client database, the first query including the first machine-readable voice command;

transmitting the audio stream to a remote server device such that the remote server device defines a second machine-readable voice command using a second speech recognizer module, the second machine-readable voice command being based at least in part on the audio stream;

receiving a second query result from the remote server device, the second query result being responsive to the transmitted audio stream;

displaying the first query result at a display of the client device, the displayed first query result including at least a first selectable result item;

displaying the second query result at the display of the client device, the displayed second query result including at least a second selectable result item, wherein the display of the first query result is not dependent upon the display of the second query result, and the display of the second query result is not dependent upon the display of the first query result;

storing at least a portion of the first query result and the second query result at a memory of the client device;

receiving, at the client device, a second audio stream that defines a subsequent voice command;

defining, using the first speech recognizer module, a third machine-readable voice command based at least in part on the subsequent voice command;

determining that the third machine-readable voice command is substantially similar to the first machine-readable voice command;

retrieving, from the memory of the client device, the stored first query result and the stored second query result when the third machine-readable voice command is determined to be substantially similar to the first machine-readable voice command;

transmitting the second audio stream associated with the subsequent voice command to the remote server device such that the remote server device defines a fourth machine-readable voice command using the second speech recognizer module, the fourth machine-readable voice command being based at least in part on the second audio stream;

receiving a third query result from the remote server device, the third query result being responsive to the transmitted second audio stream, and wherein the third query result is an updated version of the second query result; and

displaying the retrieved first query result, the retrieved second query result, and the third query result at the display of the client device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer program product, and system are provided for performing a voice command on a client device. The method can include translating, using a first speech recognizer located on the client device, an audio stream of a voice command to a first machine-readable voice command and generating a first query result using the first machine-readable voice command to query a client database. In addition, the audio stream can be transmitted to a remote server device that translates the audio stream to a second machine-readable voice command using a second speech recognizer. Further, the method can include receiving a second query result from the remote server device, where the second query result is generated by the remote server device using the second machine-readable voice command and displaying the first query result and the second query result on the client device.

46 Citations

View as Search Results

14 Claims

1. A method, comprising:
- receiving, at a client device, an audio stream that defines a voice command;
  
  defining, using a first speech recognizer module stored at the client device, a first machine-readable voice command based at least in part on the audio stream;
  
  receiving a first query result responsive to a first query sent to a client database, the first query including the first machine-readable voice command;
  
  transmitting the audio stream to a remote server device such that the remote server device defines a second machine-readable voice command using a second speech recognizer module, the second machine-readable voice command being based at least in part on the audio stream;
  
  receiving a second query result from the remote server device, the second query result being responsive to the transmitted audio stream;
  
  displaying the first query result at a display of the client device, the displayed first query result including at least a first selectable result item;
  
  displaying the second query result at the display of the client device, the displayed second query result including at least a second selectable result item, wherein the display of the first query result is not dependent upon the display of the second query result, and the display of the second query result is not dependent upon the display of the first query result;
  
  storing at least a portion of the first query result and the second query result at a memory of the client device;
  
  receiving, at the client device, a second audio stream that defines a subsequent voice command;
  
  defining, using the first speech recognizer module, a third machine-readable voice command based at least in part on the subsequent voice command;
  
  determining that the third machine-readable voice command is substantially similar to the first machine-readable voice command;
  
  retrieving, from the memory of the client device, the stored first query result and the stored second query result when the third machine-readable voice command is determined to be substantially similar to the first machine-readable voice command;
  
  transmitting the second audio stream associated with the subsequent voice command to the remote server device such that the remote server device defines a fourth machine-readable voice command using the second speech recognizer module, the fourth machine-readable voice command being based at least in part on the second audio stream;
  
  receiving a third query result from the remote server device, the third query result being responsive to the transmitted second audio stream, and wherein the third query result is an updated version of the second query result; and
  
  displaying the retrieved first query result, the retrieved second query result, and the third query result at the display of the client device.
- View Dependent Claims (2, 3, 4, 5, 12)
- - 2. The method of claim 1, wherein the storing at least a portion of the first query result and the second query result comprises storing at least a portion of the second query result at the memory, the method further comprising:
    - identifying which portion of the first query result and the second query result to store at the memory, the identifying comprising receiving a user selection of the second selectable result item included in the second query result.
  - 3. The method of claim 1, further comprising:
    - transmitting the audio stream to the remote server device during a first time interval; and
      
      sending the first query to the client database during a second time interval,wherein at least a portion of the first time interval and a portion of the second time interval overlap in time.
  - 4. The method of claim 1, wherein transmitting the audio stream comprises transmitting a compressed audio stream of the voice command from the client device to the remote server device.
  - 5. The method of claim 1, wherein displaying the first query result and the second query result comprises displaying the first query result and a first subset of the second query result at a first time instance and displaying the first query result, the first subset of the second query result, and a second subset of the second query result at a second time instance.
  - 12. The method of claim 1, wherein the displaying comprises displaying the first query result in a first field of the display view and the second query result in a second field of the display view.

6. A non-transitory processor-readable medium storing code representing instructions that when executed cause a processor of a client device to:
- receive an audio stream that defines a voice command;
  
  define, using a first speech recognizer module stored at the client device, a first machine-readable voice command based at least in part on the audio stream;
  
  send a first query to a client database, the first query being based at least in part on the first machine-readable voice command;
  
  receive a first query result responsive to the first query sent to the client database, the first query result including a list of M selectable result items, where M is a whole number;
  
  transmit the audio stream to a remote server device such that the remote server device defines a second machine-readable voice command using a second speech recognizer module, the second machine-readable voice command being based at least in part on the audio stream;
  
  receive a second query result from the remote server device, the second query result being responsive to the transmitted audio stream and including a list of N selectable result items, where N is a whole number;
  
  output the first query result including the list of M selectable result items for display on the client device;
  
  output the second query result including the list of N selectable result items for display on the client device, wherein the output of the first query result is not dependent upon the output of the second query result, and the output of the second query result is not dependent upon the output of the first query result;
  
  initiate at least a portion of the first query result and the second query result to be stored at a memory of the client device;
  
  receive a second audio stream that defines a subsequent voice command;
  
  define, using the first speech recognizer module, a third machine-readable voice command based at least in part on the subsequent voice command;
  
  determine that the third machine-readable voice command is substantially similar to the first machine-readable voice command;
  
  retrieve, from the memory of the client device, the stored first query result and the stored second query result when the third machine-readable voice command is determined to be substantially similar to the first machine-readable voice command;
  
  transmit the second audio stream associated with the subsequent voice command to the remote server device such that the remote server device defines a fourth machine-readable voice command using the second speech recognizer module, the fourth machine-readable voice command being based at least in part on the second audio stream;
  
  receive a third query result from the remote server device, the third query result being responsive to the transmitted second audio stream and wherein the third query result is an updated version of the second query result and including a list of P selectable result items, where P is a whole number; and
  
  output the third query result for display at the client device.
- View Dependent Claims (7, 8, 13)
- - 7. The processor-readable medium of claim 6, wherein the code representing instructions that when executed cause the processor to store at least a portion of the first query result and the second query result at the memory of the client device further causes the processor to:
    - identify which portion of the first query result and the second query result to store at the memory of the client device, the identification comprising receiving a user selection of a selectable result item in the list of N selectable result items included in the second query result.
  - 8. The processor-readable medium of claim 6, wherein the code representing instructions that when executed further causes the processor to:
    - transmit the audio stream to the remote server device during a first time interval; and
      
      send the first query to the client database during a second time interval,wherein at least a portion of the first time interval and a portion of the second time interval overlap in time.
  - 13. The non-transitory processor-readable medium of claim 6, wherein the first query result is displayed in a first field of the display view and the second query result is displayed in a second field of the display view.

9. A system, comprising:
- a first speech recognizer module, stored at a client device, configured to;
  
  receive an audio stream that defines a voice command;
  
  define a first machine-readable voice command based at least in part on the audio stream;
  
  receive a second audio stream that defines a subsequent voice command;
  
  define a third machine-readable voice command based at least in part on the subsequent voice command;
  
  a client query manager configured to;
  
  receive a first query result including a first number of selectable result items responsive to sending a first query to a client database, the first query being based at least in part on the first machine-readable voice command;
  
  transmit the audio stream to a remote server device such that the remote server device defines a second machine-readable voice command using a second speech recognizer module, the second machine-readable voice command being based at least in part on the audio stream;
  
  receive a second query result including a second number of selectable result items from the remote server device, the second query result being responsive to the transmitted audio stream;
  
  determine that the third machine-readable voice command is substantially similar to the first machine-readable voice command;
  
  retrieve, from the memory of the client device, the stored first query result and the stored second query result from the storage device when the third machine-readable voice command is substantially similar to the first machine-readable voice command;
  
  transmit the second audio stream associated with the subsequent voice command to the remote server device such that the remote server device defines a fourth machine-readable voice command using the second speech recognizer module, the fourth machine-readable voice command being based at least in part on the second audio stream; and
  
  receive a third query result including a third number of selectable result items from the remote server device, the third query result being responsive to the transmitted second audio stream and wherein the third query result is an updated version of the second query result;
  
  a display device configured to;
  
  display on the client device the first number of selectable result items corresponding to the first query result and display on the client device the second number of selectable result items corresponding to the second query result, wherein the display of the first number of selectable result items is not dependent upon the display of the second number of selectable result items, and the display of the second number of selectable result items is not dependent upon the display of the first number of selectable result items; and
  
  display on the client device the first number of selectable result items corresponding to the retrieved first query result, the second number of selectable result items corresponding to the retrieved second query result, and the third number of selectable result items corresponding to the third query result;
  
  a microphone configured to receive the audio stream of the voice command and to provide the audio stream to the first speech recognizer module; and
  
  a storage device configured to store at least a portion of the first query result and the second query result at a memory of the client device.
- View Dependent Claims (10, 11, 14)
- - 10. The system of claim 9, wherein the client query manager is configured to transmit the audio stream to the remote server device during a first time interval and to send the first query to the client database during a second time interval, wherein at least a portion of the first time interval and a portion of the second time interval overlap in time.
  - 11. The system of claim 9, wherein the display device is configured to display the first query result and a first subset of the second query result at a first time instance and display the first query result, the first subset of the second query result, and a second subset of the second query result at a second time instance.
  - 14. The system of claim 9, wherein the first query result is displayed in a first field of the display view and the second query result is displayed in a second field of the display view.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Gruenstein, Alexander, Byrne, William J.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US13/287,913
Publication Number

US 20120084079A1
Time in Patent Office

517 Days
Field of Search

704/254, 704/270.1, 704/270, 704/275, 704/231, 704/235, 704/219, 704/251
US Class Current

704/275
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 15/32 Multiple recognisers used i...

Integration of embedded and network speech recognizers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

46 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Integration of embedded and network speech recognizers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links