Word-level correction of speech input

US 9,711,145 B2
Filed: 11/14/2016
Issued: 07/18/2017
Est. Priority Date: 01/05/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

providing, for output, a first user interface that includes a virtual keyboard including a control for initiating speech-to-text input;

receiving (i) data indicating a selection of the control included in the virtual keyboard that is included in the user interface, and (ii) audio data comprising an utterance that was spoken after the control included in the virtual keyboard was selected;

generating, by an automated speech recognizer, a speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected; and

providing, for output, a second user interface that includes a representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

112 Citations

20 Claims

1. A computer-implemented method comprising:
- providing, for output, a first user interface that includes a virtual keyboard including a control for initiating speech-to-text input;
  
  receiving (i) data indicating a selection of the control included in the virtual keyboard that is included in the user interface, and (ii) audio data comprising an utterance that was spoken after the control included in the virtual keyboard was selected;
  
  generating, by an automated speech recognizer, a speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected; and
  
  providing, for output, a second user interface that includes a representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein the representation of the speech recognition lattice is a transcription corresponding to the utterance that was spoken after the control included in the keyboard was selected that is identified as a best hypothesis from among one or more transcriptions corresponding to the utterance that was spoken after the control included in the keyboard was selected.
  - 3. The computer-implemented method of claim 1, comprising:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice;
      
      providing, for output, a third user interface that includes one or more alternates for the selected portion of the representation of the speech recognition lattice;
      
      receiving data indicating a selection of a particular alternate from among the one or more alternates for the selected portion of the representation of the speech recognition lattice; and
      
      providing, for output, a fourth user interface that includes a second representation of the speech recognition lattice that includes the particular alternate selected from among the one or more alternates for the selected portion of the representation of the speech recognition lattice.
  - 4. The computer-implemented method of claim 1, comprising:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice;
      
      providing, for output, a third user interface that includes a control for removing the selected portion of the representation of the speech recognition lattice;
      
      receiving data indicating a selection of the control for removing the selected portion of the representation of the speech recognition lattice; and
      
      providing, for output, a fourth user interface that includes a second representation of the speech recognition lattice that does not include the selected portion of the representation of the speech recognition lattice.
  - 5. The computer-implemented method of claim 1, wherein the second user interface includes a second representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.
  - 6. The computer-implemented method of claim 5, comprising:
    - providing, for output in the second user interface, a control for replacing the representation of the speech recognition lattice with the second representation of the speech recognition lattice;
      
      receiving data indicating a selection of the control for replacing the representation of the speech recognition lattice with the second representation of the speech recognition lattice; and
      
      providing, for output, a third user interface that includes the second representation of the speech recognition lattice in place of the representation of the speech recognition lattice.
  - 7. The computer-implemented method of claim 1, comprising:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice; and
      
      providing, for output, a third user interface that includes a second representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected, wherein a portion of the second representation of the speech recognition lattice that corresponds to the selected portion of the representation of the speech recognition lattice is different from the selected portion of the representation of the speech recognition lattice.

8. A system for correcting words in transcribed text, the system comprising:
- an automated speech recognizer operable to receive speech audio data and in response transcribe the speech audio data in a word lattice; and
  
  a computing device comprising;
  
  a microphone operable to receive speech audio and generate the speech audio data,a network interface operable to send the speech audio data to the automated speech recognizer and in response receive the word lattice from the automated speech recognizer,a display screen operable to present one or more transcribed words from the word lattice,a user interface operable to receive a user selection of at least one of the transcribed words, andone or more processors and a memory storing instructions that when executed by the processors cause the computing device to perform operations to;
  
  provide, for output, a first user interface that includes a virtual keyboard including a control for initiating speech-to-text input;
  
  receive (i) data indicating a selection of the control included in the virtual keyboard that is included in the user interface, and (ii) audio data comprising an utterance that was spoken after the control included in the virtual keyboard was selected;
  
  generate, by the automated speech recognizer, a speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected; and
  
  provide, for output, a second user interface that includes a representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the representation of the speech recognition lattice is a transcription corresponding to the utterance that was spoken after the control included in the keyboard was selected that is identified as a best hypothesis from among one or more transcriptions corresponding to the utterance that was spoken after the control included in the keyboard was selected.
  - 10. The system of claim 8, wherein the operations comprise:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice;
      
      providing, for output, a third user interface that includes one or more alternates for the selected portion of the representation of the speech recognition lattice;
      
      receiving data indicating a selection of a particular alternate from among the one or more alternates for the selected portion of the representation of the speech recognition lattice; and
      
      providing, for output, a fourth user interface that includes a second representation of the speech recognition lattice that includes the particular alternate selected from among the one or more alternates for the selected portion of the representation of the speech recognition lattice.
  - 11. The system of claim 8, wherein the operations comprise:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice;
      
      providing, for output, a third user interface that includes a control for removing the selected portion of the representation of the speech recognition lattice;
      
      receiving data indicating a selection of the control for removing the selected portion of the representation of the speech recognition lattice; and
      
      providing, for output, a fourth user interface that includes a second representation of the speech recognition lattice that does not include the selected portion of the representation of the speech recognition lattice.
  - 12. The system of claim 8, wherein the second user interface includes a second representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.
  - 13. The system of claim 8, wherein the operations comprise:
    - providing, for output in the second user interface, a control for replacing the representation of the speech recognition lattice with the second representation of the speech recognition lattice;
      
      receiving data indicating a selection of the control for replacing the representation of the speech recognition lattice with the second representation of the speech recognition lattice; and
      
      providing, for output, a third user interface that includes the second representation of the speech recognition lattice in place of the representation of the speech recognition lattice.
  - 14. The system of claim 8, wherein the operations comprise:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice; and
      
      providing, for output, a third user interface that includes a second representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected, wherein a portion of the second representation of the speech recognition lattice that corresponds to the selected portion of the representation of the speech recognition lattice is different from the selected portion of the representation of the speech recognition lattice.

15. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause one or more processors to perform operations for correcting words in transcribed text, the operations comprising:
- providing, for output, a first user interface that includes a virtual keyboard including a control for initiating speech-to-text input;
  
  receiving (i) data indicating a selection of the control included in the virtual keyboard that is included in the user interface, and (ii) audio data comprising an utterance that was spoken after the control included in the virtual keyboard was selected;
  
  generating, by an automated speech recognizer, a speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected; and
  
  providing, for output, a second user interface that includes a representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein the representation of the speech recognition lattice is a transcription corresponding to the utterance that was spoken after the control included in the keyboard was selected that is identified as a best hypothesis from among one or more transcriptions corresponding to the utterance that was spoken after the control included in the keyboard was selected.
  - 17. The computer program product of claim 15, wherein the operations comprise:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice;
      
      providing, for output, a third user interface that includes one or more alternates for the selected portion of the representation of the speech recognition lattice;
      
      receiving data indicating a selection of a particular alternate from among the one or more alternates for the selected portion of the representation of the speech recognition lattice; and
      
      providing, for output, a fourth user interface that includes a second representation of the speech recognition lattice that includes the particular alternate selected from among the one or more alternates for the selected portion of the representation of the speech recognition lattice.
  - 18. The computer program product of claim 15, wherein the operations comprise:
    - receiving data indicating a selection of a portion of the representation of the speech recognition lattice;
      
      providing, for output, a third user interface that includes a control for removing the selected portion of the representation of the speech recognition lattice;
      
      receiving data indicating a selection of the control for removing the selected portion of the representation of the speech recognition lattice; and
      
      providing, for output, a fourth user interface that includes a second representation of the speech recognition lattice that does not include the selected portion of the representation of the speech recognition lattice.
  - 19. The computer program product of claim 15, wherein the second user interface includes a second representation of the speech recognition lattice corresponding to the utterance that was spoken after the control included in the virtual keyboard was selected.
  - 20. The computer program product of claim 19, wherein the operations comprise:
    - providing, for output in the second user interface, a control for replacing the representation of the speech recognition lattice with the second representation of the speech recognition lattice;
      
      receiving data indicating a selection of the control for replacing the representation of the speech recognition lattice with the second representation of the speech recognition lattice; and
      
      providing, for output, a third user interface that includes the second representation of the speech recognition lattice in place of the representation of the speech recognition lattice.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ballinger, Brandon M., Kristjansson, Trausti T., LeBeau, Michael J., Byrne, William J., Jitkoff, John Nicholas
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US15/350,309
Publication Number

US 20170069322A1
Time in Patent Office

246 Days
Field of Search

704235, 704246, 704240, 704276, 704270, 704275, 704 9, 704 10, 704251, 704257
US Class Current
CPC Class Codes

G06F 3/0482   Interaction with lists of s...

G06F 3/04842   Selection of displayed obje...

G06F 3/04886   by partitioning the display...

G06F 40/137   Hierarchical processing, e....

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/232   Orthographic correction, e....

G06F 40/284   Lexical analysis, e.g. toke...

G10L 15/01   Assessment or evaluation of...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Word-level correction of speech input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

112 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Word-level correction of speech input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

112 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links