Speech recognition using repeated utterances
First Claim
1. A computer-implemented method, comprising:
- receiving, by a computing system and at a first time, a first spoken input from a user of an electronic device, the first spoken input comprising an original utterance by the user;
based on the original utterance, determining, by the computing system, a first set of character string candidates wherein each character string candidate represents the first spoken input converted to textual characters, and wherein determining the first set of character string candidates comprises using a speech recognizer to determine a first word lattice that represents the first set of character string candidates and a first set of probabilities, each probability corresponding to a character string candidate;
providing, for display to the user, a selection of one or more of the character string candidates in response to receiving the first spoken input;
receiving, by the computing system and at a second time, a second spoken input from the user;
determining, by the computing system, that the second spoken input is a repeat utterance of the original utterance;
based on determining that the second spoken input is a repeat utterance of the original utterance, and using the original utterance and the repeat utterance, determining, by the computing system, a second set of character string candidates, wherein determining the second set of character string candidates using the original utterance and the repeat utterance comprises;
using the speech recognizer and the first word lattice as a language model to determine a second word lattice that represents the second set of character string candidates and a second set of probabilities, each probability corresponding to a character string candidate of the second set of character string candidates;
determining an intersection or union of the first word lattice and the second word lattice and, for each character string candidate included in the intersection or union, determining a combined probability based on the probabilities from the first set of probabilities and the second set of probabilities that correspond to the character string candidate; and
determining a third set of character string candidates based on the intersection or union and the determined combined probabilities.
2 Assignments
0 Petitions
Accused Products
Abstract
Subject matter described in this specification can be embodied in methods, computer program products and systems relating to speech-to-text conversion. A first spoken input is received from a user of an electronic device (an “original utterance”). Based on the original utterance, a first set of character string candidates are determined that each represent the original utterance converted to textual characters and a selection of one or more of the character string candidates are provided in a format for display to the user. A second spoken input is received from the user and a determination is made that the second spoken input is a repeat utterance of the original utterance. Based on this determination and using the original utterance and the repeat utterance, a second set of character string candidates is determined.
310 Citations
23 Claims
-
1. A computer-implemented method, comprising:
-
receiving, by a computing system and at a first time, a first spoken input from a user of an electronic device, the first spoken input comprising an original utterance by the user; based on the original utterance, determining, by the computing system, a first set of character string candidates wherein each character string candidate represents the first spoken input converted to textual characters, and wherein determining the first set of character string candidates comprises using a speech recognizer to determine a first word lattice that represents the first set of character string candidates and a first set of probabilities, each probability corresponding to a character string candidate; providing, for display to the user, a selection of one or more of the character string candidates in response to receiving the first spoken input; receiving, by the computing system and at a second time, a second spoken input from the user; determining, by the computing system, that the second spoken input is a repeat utterance of the original utterance; based on determining that the second spoken input is a repeat utterance of the original utterance, and using the original utterance and the repeat utterance, determining, by the computing system, a second set of character string candidates, wherein determining the second set of character string candidates using the original utterance and the repeat utterance comprises; using the speech recognizer and the first word lattice as a language model to determine a second word lattice that represents the second set of character string candidates and a second set of probabilities, each probability corresponding to a character string candidate of the second set of character string candidates; determining an intersection or union of the first word lattice and the second word lattice and, for each character string candidate included in the intersection or union, determining a combined probability based on the probabilities from the first set of probabilities and the second set of probabilities that correspond to the character string candidate; and determining a third set of character string candidates based on the intersection or union and the determined combined probabilities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method, comprising:
-
receiving, by a computing system and at a first time, a first spoken input from a user of an electronic device, the spoken input comprising an original utterance; based on the original utterance, determining, by the computing system, a first set of character string candidates and a confidence level corresponding to each character string candidate in the set, wherein each character string candidate represents the first spoken input converted to text; determining, by the computing system, that less than a threshold number of character string candidates in the set have a corresponding confidence level that meets or exceeds a predetermined threshold level and in response to the determination, requesting the user to provide a second spoken input; determining, using a speech recognizer, a first word lattice that represents the first set of character string candidates and a first set of probabilities, each probability corresponding to a character string candidate in the first word lattice; receiving, by the computing system and at a second time, the second spoken input from the user; determining, by the computing system and using the speech recognizer and the first word lattice as a language model, a second word lattice that represents a second set of character string candidates and a second set of probabilities, each probability corresponding to a character string candidate in the first word lattice; determining an intersection or union of the first word lattice and the second word lattice and, for each character string candidate included in the intersection or union, determining a combined probability based on the probabilities from the first set of probabilities and the second set of probabilities that correspond to the character string candidate; determining a third set of character string candidates based on the intersection or union and the determined combined probabilities; determining, by the computing system, a selection of one or more character string candidates from the third set of character string candidates; and transmitting, by the computing system, the selection of one or more character string candidates to the electronic device for display to the user.
-
-
12. A non-transitory computer-readable medium having instructions encoded thereon, which, when executed by a processor, cause the processor to perform operations comprising:
-
receiving, at a first time, a first spoken input from a user of an electronic device, the first spoken input comprising an original utterance by the user; based on the original utterance, determining a first set of character string candidates wherein each character string candidate represents the first spoken input converted to textual characters, and wherein determining the first set of character string candidates comprises using a speech recognizer to determine a first word lattice that represents the first set of character string candidates and a first set of probabilities, each probability corresponding to a character string candidate; providing for display to the user a selection of one or more of the character string candidates in response to receiving the first spoken input; receiving, at a second time, a second spoken input from the user; determining that the second spoken input is a repeat utterance of the original utterance; based on determining that the second spoken input is a repeat utterance of the original utterance, and using the original utterance and the repeat utterance, determining a second set of character string candidates, wherein determining the second set of character string candidates using the original utterance and the repeat utterance comprises; using the speech recognizer and the first word lattice as a language model to determine a second word lattice that represents the second set of character string candidates and a second set of probabilities, each probability corresponding to a character string candidate of the second set of character string candidates; determining an intersection or union of the first word lattice and the second word lattice and, for each character string candidate included in the intersection or union, determining a combined probability based on the probabilities from the first set of probabilities and the second set of probabilities that correspond to the character string candidate; and determining a third set of character string candidates based on the intersection or union and the determined combined probabilities. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
one or more computers; one or more data storage devices coupled to the one or more computers and storing instructions, which, when executed by the processor cause the one or more computers to perform operations comprising; receiving, at a first time, a first spoken input from a user of an electronic device, the first spoken input comprising an original utterance by the user; based on the original utterance, determining a first set of character string candidates wherein each character string candidate represents the first spoken input converted to textual characters, and wherein determining the first set of character string candidates comprises using a speech recognizer to determine a first word lattice that represents the first set of character string candidates and a first set of probabilities, each probability corresponding to a character string candidate; providing for display to the user a selection of one or more of the character string candidates in response to receiving the first spoken input; receiving, at a second time, a second spoken input from the user; determining that the second spoken input is a repeat utterance of the original utterance; based on determining that the second spoken input is a repeat utterance of the original utterance, and using the original utterance and the repeat utterance, determining a second set of character string candidates, wherein determining the second set of character string candidates using the original utterance and the repeat utterance comprises; using the speech recognizer and the first word lattice as a language model to determine a second word lattice that represents the second set of character string candidates and a second set of probabilities, each probability corresponding to a character string candidate of the second set of character string candidates; determining an intersection or union of the first word lattice and the second word lattice and, for each character string candidate included in the intersection or union, determining a combined probability based on the probabilities from the first set of probabilities and the second set of probabilities that correspond to the character string candidate; and determining a third set of character string candidates based on the intersection or union and the determined combined probabilities. - View Dependent Claims (19, 20, 21, 22, 23)
-
Specification