Word-level correction of speech input

US 9,542,932 B2
Filed: 02/17/2016
Issued: 01/10/2017
Est. Priority Date: 01/05/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving speech audio data generated by a microphone of a computing device, wherein the speech audio data corresponds to an utterance received by the microphone;

providing a transcription of the utterance for output in an output region of a display of the computing device, wherein the transcription of the utterance is obtained from an automated speech recognizer operable to transcribe the speech audio data corresponding to the utterance;

receiving a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words;

in response to receiving the user selection of the portion of the transcription of the utterance, presenting one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance;

receiving a user selection of a particular control from among the one or more controls; and

updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

101 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- receiving speech audio data generated by a microphone of a computing device, wherein the speech audio data corresponds to an utterance received by the microphone;
  
  providing a transcription of the utterance for output in an output region of a display of the computing device, wherein the transcription of the utterance is obtained from an automated speech recognizer operable to transcribe the speech audio data corresponding to the utterance;
  
  receiving a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words;
  
  in response to receiving the user selection of the portion of the transcription of the utterance, presenting one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance;
  
  receiving a user selection of a particular control from among the one or more controls; and
  
  updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein a control corresponding to one or more alternate words for the user-selected portion of the transcription of the utterance corresponds to an alternate transcription for the user-selected portion of the transcription of the utterance, wherein the alternate transcription for the user-selected portion of the transcription of the utterance is obtained from the automated speech recognizer.
  - 3. The computer-implemented method of claim 1, wherein the transcription of the utterance is a transcription of the utterance that has a highest probability of being correct.
  - 4. The computer-implemented method of claim 1, wherein updating the transcription of the utterance output in the output region of the display of the computing device comprises:
    - determining an updated transcription of the utterance based at least on the user selection of the particular control, the updated transcription having a highest probability of being correct of all transcriptions of the utterance that are based at least on the user selection of the particular control, wherein the transcriptions of the utterance that are based at least on the user selection of the particular control are obtained from the automated speech recognizer; and
      
      updating the transcription of the utterance output in the output region of the display of the computing device to include the updated transcription.
  - 5. The computer-implemented method of claim 1, wherein the one or more controls presented at the display of the computing device overlays at least a portion of the user-selected portion of the transcription of the utterance that is provided for output in the output region of the display of the computing device.
  - 6. The computer-implemented method of claim 1, wherein receiving the user selection of the particular control comprises receiving a user selection of a control that corresponds to the remove command;
    - andwherein updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control comprises;
      
      selecting an updated transcription of the utterance that does not include the user-selected portion of the transcription of the utterance, wherein the updated transcription of the utterance that does not include the user-selected portion of the transcription of the utterance is obtained from the automated speech recognizer, andproviding the updated transcription of the utterance for output in the output region of the display of the computing device.
  - 7. The computer-implemented method of claim 1, wherein the transcription of the utterance and the updated transcription of the utterance are each selected from a hierarchical word lattice generated by the automated speech recognizer based at least on the speech audio data corresponding to the utterance.

8. A system for correcting words in transcribed text, the system comprising:
- an automated speech recognizer operable to receive speech audio data and in response transcribe the speech audio data in a word lattice; and
  
  a computing device comprising;
  
  a microphone operable to receive speech audio and generate the speech audio data,a network interface operable to send the speech audio data to the automated speech recognizer and in response receive the word lattice from the automated speech recognizer,a display screen operable to present one or more transcribed words from the word lattice,a user interface operable to receive a user selection of at least one of the transcribed words, andone or more processors and a memory storing instructions that when executed by the processors cause the computing device to perform operations to;
  
  provide a transcription of an utterance for output in an output region of a display of a computing device;
  
  receive a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words;
  
  in response to receiving the user selection of the portion of the transcription of the utterance, present one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance;
  
  receive a user selection of a particular control from among the one or more controls; and
  
  update the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein a control corresponding to one or more alternate words for the user-selected portion of the transcription of the utterance corresponds to an alternate transcription for the user-selected portion of the transcription of the utterance.
  - 10. The system of claim 8, wherein the transcription of the utterance is a transcription of the utterance that has a highest probability of being correct.
  - 11. The system of claim 8, wherein updating the transcription of the utterance output in the output region of the display of the computing device further comprises performing operations to:
    - determine an updated transcription of the utterance based at least on the user selection of the particular control, the updated transcription having a highest probability of being correct of all transcriptions of the utterance that are based at least on the user selection of the particular control; and
      
      update the transcription of the utterance output in the output region of the display of the computing device to include the updated transcription.
  - 12. The system of claim 8, wherein the one or more controls presented at the display of the computing device overlays at least a portion of the user-selected portion of the transcription of the utterance that is provided for output in the output region of the display of the computing device.
  - 13. The system of claim 8, wherein receiving the user selection of the particular control comprises performing operations to receive a user selection of a control that corresponds to the remove command;
    - andwherein updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control comprises performing operations to;
      
      select an updated transcription of the utterance that does not include the user-selected portion of the transcription of the utterance, andprovide the updated transcription of the utterance for output in the output region of the display of the computing device.
  - 14. The system of claim 8, wherein the transcription of the utterance and the updated transcription of the utterance are each selected from a hierarchical word lattice.

15. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause one or more processors to perform operations for correcting words in transcribed text, the operations comprising:
- receiving speech audio data generated by a microphone of a computing device, wherein the speech audio data corresponds to an utterance received by the microphone;
  
  providing a transcription of the utterance for output in an output region of a display of the computing device, wherein the transcription of the utterance is obtained from an automated speech recognizer operable to transcribe the speech audio data corresponding to the utterance;
  
  receiving a user selection of a portion of the transcription of the utterance, the user-selected portion of the transcription of the utterance comprising one or more words;
  
  in response to receiving the user selection of the portion of the transcription of the utterance, presenting one or more controls at the display of the computing device that each correspond to (i) one or more alternate words for the user-selected portion of the transcription of the utterance or (ii) a remove command to remove the user-selected portion of the transcription of the utterance from the transcription of the utterance;
  
  receiving a user selection of a particular control from among the one or more controls; and
  
  updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein a control corresponding to one or more alternate words for the user-selected portion of the transcription of the utterance corresponds to an alternate transcription for the user-selected portion of the transcription of the utterance, wherein the alternate transcription for the user-selected portion of the transcription of the utterance is obtained from the automated speech recognizer.
  - 17. The computer program product of claim 15, wherein the transcription of the utterance is a transcription of the utterance that has a highest probability of being correct.
  - 18. The computer program product of claim 15, wherein updating the transcription of the utterance output in the output region of the display of the computing device comprises performing operations to:
    - determine an updated transcription of the utterance based at least on the user selection of the particular control, the updated transcription having a highest probability of being correct of all transcriptions of the utterance that are based at least on the user selection of the particular control, wherein the transcriptions of the utterance that are based at least on the user selection of the particular control are obtained from the automated speech recognizer; and
      
      update the transcription of the utterance output in the output region of the display of the computing device to include the updated transcription.
  - 19. The computer program product of claim 15, wherein the one or more controls presented at the display of the computing device overlays at least a portion of the user-selected portion of the transcription of the utterance that is provided for output in the output region of the display of the computing device.
  - 20. The computer program product of claim 15, wherein receiving the user selection of the particular control comprises performing operations to receive a user selection of a control that corresponds to the remove command;
    - andwherein updating the transcription of the utterance output in the output region of the display of the computing device based at least on the user selection of the particular control comprises performing operations to;
      
      select an updated transcription of the utterance that does not include the user-selected portion of the transcription of the utterance, wherein the updated transcription of the utterance that does not include the user-selected portion of the transcription of the utterance is obtained from the automated speech recognizer, andprovide the updated transcription of the utterance for output in the output region of the display of the computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
LeBeau, Michael J., Byrne, William J., Jitkoff, John Nicholas, Ballinger, Brandon M., Kristjansson, Trausti T.
Primary Examiner(s)
Guerra-Erazo, Edgar

Application Number

US15/045,571
Publication Number

US 20160163308A1
Time in Patent Office

328 Days
Field of Search

704/235, 704/246, 704/240, 704/276, 704/270, 704/275
US Class Current

1/1
CPC Class Codes

G06F 3/0482   Interaction with lists of s...

G06F 3/04842   Selection of displayed obje...

G06F 3/04886   by partitioning the display...

G06F 40/137   Hierarchical processing, e....

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/232   Orthographic correction, e....

G06F 40/284   Lexical analysis, e.g. toke...

G10L 15/01   Assessment or evaluation of...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Word-level correction of speech input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

101 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Word-level correction of speech input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

101 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links