MIXED MODEL SPEECH RECOGNITION

US 20130346078A1
Filed: 03/15/2013
Published: 12/26/2013
Est. Priority Date: 06/26/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances;

generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is based on user-specific data;

generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model independent of user-specific data;

determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and

based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

Citations

20 Claims

1. A computer-implemented method comprising:
- accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances;
  
  generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is based on user-specific data;
  
  generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model independent of user-specific data;
  
  determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and
  
  based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein the set of one or more terms is associated with one or more actions to be performed by the computing device.
  - 3. The method of claim 1 wherein the first speech recognizer employs a grammar-based language model.
  - 4. The method of claim 3 wherein the grammar-based language model includes a context free grammar.
  - 5. The method of claim 1 wherein the second speech recognizer employs a statistics-based language model.
  - 6. The method of claim 1 wherein the user-specific data includes a contact list for the user, an applications list of applications installed on the computing device, or a media list of media stored on the computing device.
  - 7. The method of claim 1 wherein the first speech recognizer is implemented on the computing device and the second speech recognizer is implemented on one or more server devices.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances;
  
  generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is developed based on user-specific data;
  
  generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model developed independent of user-specific data;
  
  determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and
  
  based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8 wherein the set of one or more terms is associated with one or more actions to be performed by the computing device.
  - 10. The system of claim 8 wherein the first speech recognizer employs a grammar-based language model.
  - 11. The system of claim 10 wherein the grammar-based language model includes a context free grammar.
  - 12. The system of claim 8 wherein the second speech recognizer employs a statistics-based language model.
  - 13. The system of claim 8 wherein the user-specific data includes a contact list for the user, an applications list of applications installed on the computing device, or a media list of media stored on the computing device.
  - 14. The system of claim 8 wherein the first speech recognizer is implemented on the computing device and the second speech recognizer is implemented on one or more server devices.

15. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances;
  
  determining a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer, wherein the first speech recognizer employs a language model that is developed based on user-specific data;
  
  determining a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer, wherein the second speech recognizer employs a language model developed independent of user-specific data;
  
  determining that the second transcription of the utterances includes a term from a predefined set of one or more terms; and
  
  based on determining that the second transcription of the utterance includes the term from the predefined set of one or more terms, providing an output of the first transcription of the utterance.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The medium of claim 15 wherein the set of one or more terms is associated with one or more actions to be performed by the computing device.
  - 17. The medium of claim 15 wherein the first speech recognizer employs a grammar-based language model.
  - 18. The medium of claim 15 wherein the second speech recognizer employs a statistics-based language model.
  - 19. The medium of claim 15 wherein the user-specific data includes a contact list for the user, an applications list of applications installed on the computing device, or a media list of media stored on the computing device.
  - 20. The medium of claim 15 wherein the first speech recognizer is implemented on the computing device and the second speech recognizer is implemented on one or more server devices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Gruenstein, Alexander H., Aleksic, Petar

Granted Patent

US 10,354,650 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

MIXED MODEL SPEECH RECOGNITION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MIXED MODEL SPEECH RECOGNITION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links