Multiple recognizer speech recognition

US 9,058,805 B2
Filed: 05/13/2013
Issued: 06/16/2015
Est. Priority Date: 05/13/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by a data processing apparatus, the method comprising:

receiving audio data that corresponds to an utterance;

obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;

obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar;

aligning the first and second transcriptions of the utterance to generate an aligned transcription; and

classifying the utterance, based at least on a portion of the aligned transcription, as a voice command or a voice query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

45 Citations

View as Search Results

17 Claims

1. A computer-implemented method performed by a data processing apparatus, the method comprising:
- receiving audio data that corresponds to an utterance;
  
  obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
  
  obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar;
  
  aligning the first and second transcriptions of the utterance to generate an aligned transcription; and
  
  classifying the utterance, based at least on a portion of the aligned transcription, as a voice command or a voice query.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the operations of at least one of the limited speech recognizer and the expanded speech recognizer are performed at a server computer device.
  - 3. The method of claim 1, further comprising:
    - in response to classifying the utterance, based at least on a portion of the aligned transcription, as the voice command;
      
      generating the voice command using at least a portion of the first transcription and at least part of the second transcription; and
      
      initiating the voice command; and
      
      in response to classifying the utterance as the voice query;
      
      generating the voice query using at least a portion of the first transcription and at least part of the second transcription; and
      
      initiating the voice query.
  - 4. The method of claim 1, wherein the limited speech recognizer is configured to recognize one or more of a collection of placeholder terms, a collection of voice command terms, and a collection of contact names from a contact list.
  - 5. The method of claim 1, wherein the expanded speech recognizer is configured to recognize one or more of a collection of general grammar terms, a collection of placeholder terms, a collection of proper names, and a collection of voice command terms.
  - 6. The method of claim 5, wherein the expanded speech recognizer is not configured to recognize a collection of contact names from a contact list.
  - 7. The method of claim 1, wherein the operations of at least one of the limited speech recognizer and the expanded speech recognizer are performed at a mobile device.

8. A system comprising:
- a data processing apparatus; and
  
  a non-transitory memory storage storing instructions executable by the data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising;
  
  receiving audio data that corresponds to an utterance;
  
  obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
  
  obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar;
  
  aligning the first and second transcriptions of the utterance to generate an aligned transcription; and
  
  classifying the utterance based at least on a portion of the aligned transcription, as a voice command or a voice query.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8, wherein the operations of at least one of the limited speech recognizer and the expanded speech recognizer are performed at a mobile device.
  - 10. The system of claim 8, the operations further comprising:
    - in response to classifying the utterance, based at least on a portion of the aligned transcription, as the voice command;
      
      generating the voice command using at least a portion of the first transcription and at least part of the second transcription; and
      
      initiating the voice command; and
      
      in response to classifying the utterance as the voice query;
      
      generating the voice query using at least a portion of the first transcription and at least part of the second transcription; and
      
      initiating the voice query.
  - 11. The system of claim 8, wherein the limited speech recognizer is configured to recognize one or more of a collection of placeholder terms, a collection of voice command terms, and a collection of contact names from a contact list.
  - 12. The system of claim 8, wherein the expanded speech recognizer is configured to recognize one or more of a collection of general grammar terms, a collection of placeholder terms, a collection of proper names, and a collection of voice command terms.
  - 13. The system of claim 12, wherein the expanded speech recognizer is not configured to recognize a collection of contact names from a contact list.

14. A non-transitory computer readable medium storing instructions executable by a data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising:
- receiving audio data that corresponds to an utterance;
  
  obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
  
  obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar;
  
  aligning the first and second transcriptions of the utterance to generate an aligned transcription; and
  
  classifying the utterance based at least on a portion of the aligned transcription, as a voice command or a voice query.
- View Dependent Claims (15, 16, 17)
- - 15. The computer readable medium of claim 14, the operations further comprising:
    - in response to classifying the utterance, based at least on a portion of the aligned transcription, as the voice command;
      
      generating the voice command using at least a portion of the first transcription and at least part of the second transcription; and
      
      initiating the voice command; and
      
      in response to classifying the utterance as the voice query;
      
      generating the voice query using at least a portion of the first transcription and at least part of the second transcription; and
      
      initiating the voice query.
  - 16. The computer readable medium of claim 14, wherein the limited speech recognizer is configured to recognize one or more of a collection of placeholder terms, a collection of voice command terms, and a collection of contact names from a contact list.
  - 17. The computer readable medium of claim 14, wherein the expanded speech recognizer is configured to recognize one or more of a collection of general grammar terms, a collection of placeholder terms, a collection of proper names, and a collection of voice command terms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aleksic, Petar, Mengibar, Pedro J., Biadsy, Fadi
Primary Examiner(s)
Baker, Charlotte M

Application Number

US13/892,590
Publication Number

US 20140337032A1
Time in Patent Office

764 Days
Field of Search

704/235, 704/9, 704/236, 704/E15.014, 704/243, 704/10, 704/257, 704/270, 704/5, 382/224
US Class Current

1/1
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

H04M 2250/74   with voice recognition mean...

Multiple recognizer speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple recognizer speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links