Multiple recognizer speech recognition

US 9,293,136 B2
Filed: 06/01/2015
Issued: 03/22/2016
Est. Priority Date: 05/13/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving (i) a first transcription of a particular utterance from a first computing device and (ii) a second transcription of the particular utterance from a second computing device;

determining a grammatical alignment between the first transcription and the second transcription based on a comparison between the first transcription and the second transcription;

associating each word or phrase within the first transcription and the second transcription with a measure respectively calculated for each word or phrase within the first transcription and the second transcription, the measure corresponding to a likelihood of relevance for each word or phrase within the first transcription and the second transcription;

comparing the measure associated with each word or phrase within the first transcription and the second transcription;

generating a combined transcription from the first transcription and the second transcription that represents the particular utterance based on the comparison of the measure associated with each word or phrase within the first transcription and the second transcription; and

providing the combined transcription as a speech recognizer output of the particular utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

Citations

19 Claims

1. A computer-implemented method comprising:
- receiving (i) a first transcription of a particular utterance from a first computing device and (ii) a second transcription of the particular utterance from a second computing device;
  
  determining a grammatical alignment between the first transcription and the second transcription based on a comparison between the first transcription and the second transcription;
  
  associating each word or phrase within the first transcription and the second transcription with a measure respectively calculated for each word or phrase within the first transcription and the second transcription, the measure corresponding to a likelihood of relevance for each word or phrase within the first transcription and the second transcription;
  
  comparing the measure associated with each word or phrase within the first transcription and the second transcription;
  
  generating a combined transcription from the first transcription and the second transcription that represents the particular utterance based on the comparison of the measure associated with each word or phrase within the first transcription and the second transcription; and
  
  providing the combined transcription as a speech recognizer output of the particular utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein aligning the first transcription with the second transcription comprises at least one of a pairwise alignment, a sequence alignment, and inexact matching.
  - 3. The computer-implemented method of claim 1, wherein generating the combined transcription comprises using words or phrases from the first transcription for which the measure is greater than a particular threshold and using words or phrases from the second transcription for which the measure satisfies a certain threshold to obtain the combined transcription that represents the particular utterance.
  - 4. The computer-implemented method of claim 1,wherein the first transcription was generated using a limited speech recognizer, the limited speech recognizer comprising a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
    - andwherein the second transcription was generated using an expanded speech recognizer, the expanded speech recognizer comprising a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar.
  - 5. The computer-implemented method of claim 4, further comprising:
    - determining that a particular word or phrase was generated by the limited speech recognizer at a grammatical position within the particular utterance that indicates the particular utterance comprises a voice action command.
  - 6. The computer-implemented method of claim 4, further comprising:
    - determining that a particular word or phrase was generated by the expanded speech recognizer that indicates the particular utterance comprises a voice search command.
  - 7. The computer-implemented method of claim 1, further comprising:
    - analyzing the first transcription and the second transcription, which have been aligned, to determine a type of the particular utterance,wherein the measure is based on the type of the particular utterance that is determined.
  - 8. The computer-implemented method of claim 7, wherein the type of the particular utterance comprises at least one of a voice action command and a voice search command.

9. A system comprising:
- one or more processors and one or more storage devices storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising;
  
  receiving (i) a first transcription of a particular utterance from a first computing device and (ii) a second transcription of the particular utterance from a second computing device;
  
  determining a grammatical alignment between the first transcription and the second transcription based on a comparison between the first transcription and the second transcription;
  
  associating each word or phrase within the first transcription and the second transcription with a measure respectively calculated for each word or phrase within the first transcription and the second transcription, the measure corresponding to a likelihood of relevance for each word or phrase within the first transcription and the second transcription;
  
  comparing the measure associated with each word or phrase within the first transcription and the second transcription;
  
  generating a combined transcription from the first transcription and the second transcription that represents the particular utterance based on the comparison of the measure associated with each word or phrase within the first transcription and the second transcription; and
  
  providing the combined transcription as a speech recognizer output of the particular utterance.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer-implemented method of claim 9, wherein aligning the first transcription with the second transcription comprises at least one of a pairwise alignment, a sequence alignment, and inexact matching.
  - 11. The computer-implemented method of claim 9, wherein generating the combined transcription comprises using words or phrases from the first transcription for which the measure is greater than a particular threshold and using words or phrases from the second transcription for which the measure satisfies a certain threshold to obtain the combined transcription that represents the particular utterance.
  - 12. The computer-implemented method of claim 9,wherein the first transcription was generated using a limited speech recognizer, the limited speech recognizer comprising a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
    - andwherein the second transcription was generated using an expanded speech recognizer, the expanded speech recognizer comprising a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar.
  - 13. The computer-implemented method of claim 12, further comprising:
    - determining that a particular word or phrase was generated by the limited speech recognizer at a grammatical position within the particular utterance that indicates the particular utterance comprises a voice action command.
  - 14. The computer-implemented method of claim 12, further comprising:
    - determining that a particular word or phrase was generated by the expanded speech recognizer that indicates the particular utterance comprises a voice search command.
  - 15. The computer-implemented method of claim 9, further comprising:
    - analyzing the first transcription and the second transcription, which have been aligned, to determine a type of the particular utterance,wherein the measure is based on the type of the particular utterance that is determined.
  - 16. The computer-implemented method of claim 15, wherein the type of the particular utterance comprises at least one of a voice action command and a voice search command.

17. A non-transitory computer-readable medium storing instructions executable by one or more computers that, upon such execution, cause the one or more computers to perform operations comprising:
- receiving (i) a first transcription of a particular utterance from a first computing device and (ii) a second transcription of the particular utterance from a second computing device;
  
  determining a grammatical alignment between the first transcription and the second transcription based on a comparison between the first transcription and the second transcription;
  
  associating each word or phrase within the first transcription and the second transcription with a measure respectively calculated for each word or phrase within the first transcription and the second transcription, the measure corresponding to a likelihood of relevance for each word or phrase within the first transcription and the second transcription;
  
  comparing the measure associated with each word or phrase within the first transcription and the second transcription;
  
  generating a combined transcription from the first transcription and the second transcription that represents the particular utterance based on the comparison of the measure associated with each word or phrase within the first transcription and the second transcription; and
  
  providing the combined transcription as a speech recognizer output of the particular utterance.
- View Dependent Claims (18, 19)
- - 18. The non-transitory computer-readable medium of claim 17,wherein the first transcription was generated using a limited speech recognizer, the limited speech recognizer comprising a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar;
    - andwherein the second transcription was generated using an expanded speech recognizer, the expanded speech recognizer comprising a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar.
  - 19. The non-transitory computer-readable medium of claim 17, wherein generating the combined transcription comprises using words or phrases from the first transcription for which the measure is greater than a particular threshold and using words or phrases from the second transcription for which the measure satisfies a certain threshold to obtain the combined transcription that represents the particular utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aleksic, Petar, Moreno Mengibar, Pedro J., Biadsy, Fadi
Primary Examiner(s)
Baker, Charlotte M

Application Number

US14/726,943
Publication Number

US 20150262581A1
Time in Patent Office

295 Days
Field of Search

704/235, 704/5, 704/9, 704/257, 382/224
US Class Current

1/1
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

H04M 2250/74   with voice recognition means

Multiple recognizer speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple recognizer speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links