Word-level correction of speech input

US 9,087,517 B2
Filed: 07/22/2013
Issued: 07/21/2015
Est. Priority Date: 01/05/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining first and second transcriptions of an utterance from an automated speech recognizer, wherein the second transcription of the utterance represents an alternate recognition result to the first transcription of the utterance, and wherein a portion of the first transcription of the utterance is different than a corresponding portion of the second transcription of the utterance;

providing the first transcription of the utterance for output;

receiving data indicating a single selection of the portion of the first transcription of the utterance; and

in response to receiving the data indicating the single selection, providing the second transcription of the utterance for output.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

Citations

19 Claims

1. A computer-implemented method comprising:
- obtaining first and second transcriptions of an utterance from an automated speech recognizer, wherein the second transcription of the utterance represents an alternate recognition result to the first transcription of the utterance, and wherein a portion of the first transcription of the utterance is different than a corresponding portion of the second transcription of the utterance;
  
  providing the first transcription of the utterance for output;
  
  receiving data indicating a single selection of the portion of the first transcription of the utterance; and
  
  in response to receiving the data indicating the single selection, providing the second transcription of the utterance for output.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein the first transcription of the utterance includes one or more words from a word lattice and the second transcription of the utterance includes one or more alternate words from the word lattice that correspond to the portion of the first transcription of the utterance.
  - 3. The computer-implemented method of claim 2, wherein the word lattice comprises nodes corresponding to words of the first transcription of the utterance and words of the second transcription of the utterance, and edges between the nodes that identify possible paths through the word lattice, wherein each path has an associated probability of being correct.
  - 4. The computer-implemented method of claim 1, wherein the first transcription of the utterance corresponds to a recognition result from the automated speech recognizer that has a highest speech recognition confidence score.
  - 5. The computer-implemented method of claim 1, wherein the second transcription of the utterance corresponds to a recognition result from the automated speech recognizer that includes one or more alternate words corresponding to the portion of the first transcription of the utterance and that has the greatest probability of being correct.
  - 6. The computer-implemented method of claim 1, wherein obtaining the second transcription of the utterance comprises:
    - identifying the portion of the first transcription of the utterance;
      
      determining that an alternate portion corresponding to the portion of the first transcription of the utterance is the alternate portion that is most likely to be the correct alternate portion; and
      
      obtaining the second transcription of the utterance that includes the alternate portion that is most likely to be the correct alternate portion.
  - 7. The computer-implemented method of claim 1, wherein:
    - the first transcription of the utterance and the second transcription of the utterance are provided for output at a touchscreen display of a computing device; and
      
      the data indicating the single selection of the portion of the first transcription of the utterance is received in response to user input at the touchscreen display of the computing device.

8. A computer-implemented system for correcting words in transcribed text, the system comprising:
- an automated speech recognizer operable to receive speech audio data and in response transcribe the speech audio data in a word lattice; and
  
  a computing device comprising;
  
  a microphone operable to receive speech audio and generate the speech audio data,a network interface operable to send the speech audio data to the automated speech recognizer and in response receive the word lattice from the automated speech recognizer,a display screen operable to present one or more transcribed words from the word lattice,a user interface operable to receive a user selection of at least one of the transcribed words, andone or more processors and a memory storing instructions that when executed by the processors cause the computing device to perform operations to;
  
  provide the user interface that includes (i) an output area for outputting a first transcription of an utterance, and (ii) a control associated with a second transcription of the utterance, wherein the second transcription of the utterance represents an alternate recognition result to the first transcription of the utterance, and wherein a portion of the first transcription of the utterance is different than a corresponding portion of the second transcription of the utterance;
  
  present, at the output area, the first transcription of the utterance, wherein the first transcription of the utterance includes one or more words from the word lattice;
  
  receive data indicating a selection of the control associated with the second transcription of the utterance; and
  
  update the output area to replace the first transcription of the utterance with the second transcription of the utterance.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8, wherein the word lattice comprises nodes corresponding to words of the first transcription of the utterance and the second transcription of the utterance, and edges between the nodes that identify possible paths through the word lattice, wherein each path has an associated probability of being correct.
  - 10. The system of claim 9, wherein the first transcription of the utterance corresponds to a path through the word lattice that has the greatest probability of being correct.
  - 11. The system of claim 9, wherein the second transcription of the utterance corresponds to a path through the word lattice that has the second greatest probability of being correct.
  - 12. The system of claim 9, wherein the second transcription of the utterance corresponds to a path through the word lattice that is the only path through the word lattice other than the path through the word lattice that corresponds to the first transcription of the utterance.
  - 13. The system of claim 8, wherein the user interface includes a second output area for outputting the second transcription of the utterance.

14. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause one or more processors to perform operations for correcting words in transcribed text, the operations comprising:
- providing, from a word lattice obtained by an automated speech recognizer, a first transcription of an utterance, the first transcription of the utterance including one or more words;
  
  receiving data indicating a single selection of a single word from the one or more words of the first transcription of the utterance;
  
  in response to receiving the data indicating the single selection, identifying an alternate word from the word lattice that corresponds to the single word;
  
  determining that the alternate word has a highest speech recognizer confidence measure value among all alternate words for the single word that are in the word lattice;
  
  selecting, from the word lattice, a second transcription of the utterance that includes the alternate word; and
  
  replacing the first transcription of the utterance with the second transcription of the utterance.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computer program product of claim 14, further comprising:
    - before receiving the data indicating the single selection of the single word from the one or more words of the first transcription of the utterance, providing the first transcription of the utterance for output; and
      
      after replacing the first transcription of the utterance with the second transcription of the utterance, providing the second transcription of the utterance for output.
  - 16. The computer program product of claim 15, wherein:
    - the first transcription of the utterance and the second transcription of the utterance are provided for output at a touchscreen display of a computing device; and
      
      the data indicating the single selection of the single word from the one or more words of the first transcription of the utterance is received in response to user input at the touchscreen display of the computing device.
  - 17. The computer program product of claim 14, wherein determining that the alternate word has the highest speech recognizer confidence measure value among all alternate words for the single word that are in the word lattice comprises:
    - determining that the alternate word is the only alternate word from the word lattice that corresponds to the single word.
  - 18. The computer program product of claim 14, wherein determining that the alternate word has the highest speech recognizer confidence measure value among all alternate words for the single word that are in the word lattice comprises:
    - determining a speech recognizer confidence measure value that is associated with a different alternate word from the word lattice that corresponds to the single word;
      
      determining that the speech recognizer confidence measure value associated with the alternate word exceeds the speech recognizer confidence measure value associated with the different alternate word by a threshold amount; and
      
      identifying the alternate word as the correct alternate word for the single word, based on determining that the speech recognizer confidence measure value associated with the alternate word exceeds the speech recognizer confidence measure value associated with the different alternate word by the threshold amount.
  - 19. The computer program product of claim 14, wherein selecting the second transcription of the utterance that includes the alternate word comprises:
    - selecting, from the word lattice, a transcription of the utterance that includes the alternate word; and
      
      determining that the transcription of the utterance that includes the alternate word lattice has a highest speech recognizer confidence measure value among all transcriptions of the utterance that include the alternate word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
LeBeau, Michael J., Byrne, William J., Jitkoff, John Nicholas, Ballinger, Brandon M., Kristjansson, Trausti T.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/947,284
Publication Number

US 20130304467A1
Time in Patent Office

729 Days
Field of Search

704/235, 704/246, 704/240, 704/276, 704/270, 704/275, 704/9, 704/10, 704/251, 704/257
US Class Current

1/1
CPC Class Codes

G06F 3/0482   Interaction with lists of s...

G06F 3/04842   Selection of displayed obje...

G06F 3/04886   by partitioning the display...

G06F 40/137   Hierarchical processing, e....

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/232   Orthographic correction, e....

G06F 40/284   Lexical analysis, e.g. toke...

G10L 15/01   Assessment or evaluation of...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Word-level correction of speech input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Word-level correction of speech input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links