Word-level correction of speech input

US 9,881,608 B2
Filed: 05/30/2017
Issued: 01/30/2018
Est. Priority Date: 01/05/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a client device, audio data corresponding to a user utterance;

providing, by the client device, the audio data to a server-based, automated speech recognizer;

receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;

providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;

receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and

in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

146 Citations

19 Claims

1. A computer-implemented method comprising:
- receiving, by a client device, audio data corresponding to a user utterance;
  
  providing, by the client device, the audio data to a server-based, automated speech recognizer;
  
  receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
  
  providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
  
  receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and
  
  in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein each of the multiple candidate transcriptions of the user utterance is associated with a speech recognition lattice that is generated by the server-based, automated speech recognizer based on the audio data.
  - 3. The computer-implemented method of claim 2, wherein the second transcription of the user utterance is an only alternate transcription of the user utterance associated with the speech recognition lattice that includes one or more alternate words substituted for the at least one incorrect word.
  - 4. The computer-implemented method of claim 2, wherein the second transcription of the user utterance is an alternate transcription of the user utterance associated with the speech recognition lattice that has a highest calculated probability of being a correct transcription of the user utterance.
  - 5. The computer-implemented method of claim 4, wherein the calculated probability of the second transcription of the user utterance is greater than a calculated probability of another transcription of the user utterance associated with the speech recognition lattice that has a second-highest calculated probability of being a correct transcription of the user utterance by at least a predetermined amount.
  - 6. The computer-implemented method of claim 1, wherein receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance comprises receiving data indicating a long press input, through the user interface, selecting the at least the portion of the representation of the first transcription of the user utterance, wherein the long press input indicates a request to replace the at least the portion of the representation of the first transcription of the user utterance with one or more alternate words substituted for the at least one incorrect word.
  - 7. The computer-implemented method of claim 1, wherein receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance comprises receiving data indicating a selection, through the user interface, of an alternate phrase control, wherein the selection of the alternate phrase control identifies the entirety of the representation of the first transcription of the user utterance.

8. A computer program product, encoded in a non-transitory computer-readable medium, operable to cause one or more processors to perform operations for correcting words in transcribed text, the operations comprising:
- receiving, by a client device, audio data corresponding to a user utterance;
  
  providing, by the client device, the audio data to a server-based, automated speech recognizer;
  
  receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
  
  providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
  
  receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and
  
  in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer program product of claim 8, wherein each of the multiple candidate transcriptions of the user utterance is associated with a speech recognition lattice that is generated by the server-based, automated speech recognizer based on the audio data.
  - 10. The computer program product of claim 9, wherein the second transcription of the user utterance is an only alternate transcription of the user utterance associated with the speech recognition lattice that includes one or more alternate words substituted for the at least one incorrect word.
  - 11. The computer program product of claim 9, wherein the second transcription of the user utterance is an alternate transcription of the user utterance associated with the speech recognition lattice that has a highest calculated probability of being a correct transcription of the user utterance.
  - 12. The computer program product of claim 8, wherein receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance comprises receiving data indicating a long press input, through the user interface, selecting the at least the portion of the representation of the first transcription of the user utterance, wherein the long press input indicates a request to replace the at least the portion of the representation of the first transcription of the user utterance with one or more alternate words substituted for the at least one incorrect word.
  - 13. The computer program product of claim 8, wherein receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance comprises receiving data indicating a selection, through the user interface, of an alternate phrase control, wherein the selection of the alternate phrase control identifies the entirety of the representation of the first transcription of the user utterance.

14. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
- receiving, by a client device, audio data corresponding to a user utterance;
  
  providing, by the client device, the audio data to a server-based, automated speech recognizer;
  
  receiving, by the client device, multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
  
  providing, for output by the client device, a user interface that includes a representation of a first transcription of the user utterance that is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer;
  
  receiving, by the client device, data indicating a selection, through the user interface, of at least a portion of the representation of the first transcription of the user utterance, wherein the selection identifies the at least the portion of the representation of the first transcription of the user utterance as including at least one incorrect word; and
  
  in response to receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance, replacing, by the client device and without (i) presenting an option to select one or more alternate words for the at least one incorrect word or (ii) initiating additional communication with the server-based automated speech recognizer, the representation of the first transcription of the user utterance in the user interface with a representation of a second transcription of the user utterance that (i) is selected from among the multiple candidate transcriptions of the user utterance that are generated by the server-based, automated speech recognizer, and (ii) includes one or more alternate words substituted for the at least one incorrect word.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, wherein each of the multiple candidate transcriptions of the user utterance is associated with a speech recognition lattice that is generated by the server-based, automated speech recognizer based on the audio data.
  - 16. The system of claim 15, wherein the second transcription of the user utterance is an only alternate transcription of the user utterance associated with the speech recognition lattice that includes one or more alternate words substituted for the at least one incorrect word.
  - 17. The system of claim 15, wherein the second transcription of the user utterance is an alternate transcription of the user utterance associated with the speech recognition lattice that has a highest calculated probability of being a correct transcription of the user utterance.
  - 18. The system of claim 17, wherein the calculated probability of the second transcription of the user utterance is greater than a calculated probability of another transcription of the user utterance associated with the speech recognition lattice that has a second-highest calculated probability of being a correct transcription of the user utterance by at least a predetermined amount.
  - 19. The system of claim 14, wherein receiving the data indicating the selection, through the user interface, of the at least the portion of the representation of the first transcription of the user utterance comprises receiving data indicating a long press input, through the user interface, selecting the at least the portion of the representation of the first transcription of the user utterance, wherein the long press input indicates a request to replace the at least the portion of the representation of the first transcription of the user utterance with one or more alternate words substituted for the at least one incorrect word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
LeBeau, Michael J., Byrne, William J., Jitkoff, John Nicholas, Ballinger, Brandon M., Kristjansson, Trausti T.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US15/608,110
Publication Number

US 20170270926A1
Time in Patent Office

245 Days
Field of Search

704235, 704246, 704240, 704276, 704270, 704275, 704 9, 704 10, 704251, 704257
US Class Current
CPC Class Codes

G06F 3/0482   Interaction with lists of s...

G06F 3/04842   Selection of displayed obje...

G06F 3/04886   by partitioning the display...

G06F 40/137   Hierarchical processing, e....

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/232   Orthographic correction, e....

G06F 40/284   Lexical analysis, e.g. toke...

G10L 15/01   Assessment or evaluation of...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

Word-level correction of speech input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

146 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Word-level correction of speech input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

146 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links