Automatic speech recognition based on user feedback
First Claim
1. A method for processing speech in a digital assistant, the method comprising:
- at an electronic device with a processor and memory storing one or more programs for execution by the processor;
receiving, from a network interface, a first speech input;
processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result;
performing a first task corresponding to a first user intent determined from the first speech recognition result;
upon performing the first task, receiving, from the network interface, an input representing a rejection of the first task;
in response to receiving the input, providing a prompt seeking a repetition of at least a portion of the first speech input;
receiving, from the network interface, a second speech input;
in accordance with the received input representing a rejection of the first task, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system;
determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and
performing a second task corresponding to a second user intent determined from the combined speech recognition result.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result.
-
Citations
50 Claims
-
1. A method for processing speech in a digital assistant, the method comprising:
at an electronic device with a processor and memory storing one or more programs for execution by the processor; receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving, from the network interface, an input representing a rejection of the first task; in response to receiving the input, providing a prompt seeking a repetition of at least a portion of the first speech input; receiving, from the network interface, a second speech input; in accordance with the received input representing a rejection of the first task, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method for processing speech in a digital assistant, the method comprising:
at an electronic device with a processor and memory storing one or more programs for execution by the processor; receiving an input containing user speech; processing the input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving a second input representing a rejection of the first task; in response to receiving the second input, processing at least a portion of the audio signal using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. - View Dependent Claims (11, 12, 13, 14)
-
15. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs comprising instructions for:
-
receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; receiving, from the network interface, a second speech input; determining whether a phonemic transcription of the second speech input has an error rate that is less than a predetermined value when compared against a phonemic transcription of a corresponding portion of the first speech input; in response to determining that the phonemic transcription of the second speech input has an error rate that is less than the predetermined value when compared against the phonemic transcription of a corresponding portion of the first speech input, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result; and performing a second task corresponding to a second user intent determined based on the second speech recognition result. - View Dependent Claims (16, 17, 18, 19)
-
-
20. An electronic device comprising:
-
one or more processors; memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; receiving, from the network interface, a second speech input; determining whether a phonemic transcription of the second speech input has an error rate that is less than a predetermined value when compared against a phonemic transcription of a corresponding portion of the first speech input; in response to determining that the phonemic transcription of the second speech input has an error rate that is less than the predetermined value when compared against the phonemic transcription of a corresponding portion of the first speech input, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result; and performing a second task corresponding to a second user intent determined based on the second speech recognition result. - View Dependent Claims (21, 22, 23, 24)
-
-
25. A method for processing speech in a digital assistant, the method comprising:
-
at an electronic device with a processor and memory storing one or more programs for execution by the processor; receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; receiving, from the network interface, a second speech input; determining whether a phonemic transcription of the second speech input has an error rate that is less than a predetermined value when compared against a phonemic transcription of a corresponding portion of the first speech input; in response to determining that the phonemic transcription of the second speech input has an error rate that is less than the predetermined value when compared against the phonemic transcription of a corresponding portion of the first speech input, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result; and performing a second task corresponding to a second user intent determined based on the second speech recognition result. - View Dependent Claims (26, 27, 28, 29)
-
-
30. An electronic device, comprising:
-
one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving, from the network interface, an input representing a rejection of the first task; in response to receiving the input, providing a prompt seeking a repetition of at least a portion of the first speech input; receiving, from the network interface, a second speech input; in accordance with the received input representing a rejection of the first task, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. - View Dependent Claims (31, 32, 33, 34, 35)
-
-
36. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs comprising instructions for:
-
receiving, from a network interface, a first speech input; processing the first speech input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving, from the network interface, an input representing a rejection of the first task; in response to receiving the input, providing a prompt seeking a repetition of at least a portion of the first speech input; receiving, from the network interface, a second speech input; in accordance with the received input representing a rejection of the first task, processing the second speech input using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. - View Dependent Claims (37, 38, 39, 40)
-
-
41. An electronic device, comprising:
-
one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving an input containing user speech; processing the input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result;
’
upon performing the first task, receiving a second input representing a rejection of the first task;in response to receiving the second input, processing at least a portion of the-audio signal using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. - View Dependent Claims (42, 43, 44, 45)
-
-
46. A non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs comprising instructions for:
-
receiving an input containing user speech; processing the input using a first automatic speech recognition system to produce a first speech recognition result; performing a first task corresponding to a first user intent determined from the first speech recognition result; upon performing the first task, receiving a second input representing a rejection of the first task; in response to receiving the second input, processing at least a portion of the-audio signal using a second automatic speech recognition system to produce a second speech recognition result, wherein the first automatic speech recognition system includes one or more speech recognition models, and the second automatic speech recognition system includes one or more speech recognition models that are different from the one or more speech recognition models of the first automatic speech recognition system; determining a combined speech recognition result based on the first speech recognition result and the second speech recognition result; and performing a second task corresponding to a second user intent determined from the combined speech recognition result. - View Dependent Claims (47, 48, 49, 50)
-
Specification