Methods and apparatus for reducing latency in speech recognition applications
First Claim
Patent Images
1. A computing device including a speech-enabled application installed thereon, the computing device comprising:
- an input interface configured to receive first audio comprising speech from a user of the computing device;
an automatic speech recognition (ASR) engine configured to;
detect based, at least in part, on a threshold time for endpointing, an end of speech in the first audio; and
generate a first ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech; and
at least one processor programmed to;
determine whether a valid action can be performed by the speech-enabled application using the first ASR result;
instruct the ASR engine to process second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the first ASR result;
create a first hint based, at least in part, on the first ASR result, wherein the first hint prompts the user for speech input corresponding to a valid action that can be performed by the speech-enabled application; and
present the first hint via a user interface of the computing device.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for reducing latency in speech recognition applications. The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result.
13 Citations
19 Claims
-
1. A computing device including a speech-enabled application installed thereon, the computing device comprising:
-
an input interface configured to receive first audio comprising speech from a user of the computing device; an automatic speech recognition (ASR) engine configured to; detect based, at least in part, on a threshold time for endpointing, an end of speech in the first audio; and generate a first ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech; and at least one processor programmed to; determine whether a valid action can be performed by the speech-enabled application using the first ASR result; instruct the ASR engine to process second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the first ASR result; create a first hint based, at least in part, on the first ASR result, wherein the first hint prompts the user for speech input corresponding to a valid action that can be performed by the speech-enabled application; and present the first hint via a user interface of the computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device including a speech-enabled application installed thereon, the computing device comprising:
-
at least one storage device configured to store at least one data structure including information describing a plurality of natural language understanding (NLU) results and corresponding ASR output used to generate the plurality of NW results; an input interface configured to receive first audio comprising speech from a user of the computing device; an automatic speech recognition (ASR) engine configured to; detect based, at least in part, on a threshold time for endpointing, an end of speech in the first audio; and generate a first ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech; and at least one processor programmed to; determine whether a valid action can be performed by the speech-enabled application using the first ASR result; instruct the ASR engine to process second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the first ASR result; determine whether to add the first ASR result and a corresponding NLU result generated using the first ASK result to the at least one data structure stored on the at least one storage device; and add the first ASR result and the corresponding NLU result generated using the first ASR result to the at least one data structure stored on the at least one storage device in response to determining that the first ASK result and the corresponding NLU result should be added. - View Dependent Claims (15, 16, 17)
-
-
18. A method, comprising:
-
receiving, by an input interface of a computing device, first audio comprising speech from a user of the computing device; detecting, by an automatic speech recognition (ASR) engine of the computing device, an end of speech in the first audio; generating, by the ASR engine, an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech; determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result; instructing the ASR engine to process second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result; creating a first hint based, at least in part, on the first ASR result, wherein the first hint prompts the user for speech input corresponding to a valid action that can be performed by the speech-enabled application; and presenting the first hint via a user interface of the computing device.
-
-
19. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by a computing device, performs a method, the method comprising:
-
receiving first audio comprising speech from a user of the computing device; detecting an end of speech in the first audio; generating an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech; determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result; processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result; creating a first hint based, at least in part, on the first ASR result, wherein the first hint prompts the user for speech input corresponding to a valid action that can be performed by the speech-enabled application; and presenting the first hint via a user interface of the computing device.
-
Specification