PRE-WAKEWORD SPEECH PROCESSING
First Claim
Patent Images
1. A computer-implemented method for processing a spoken command when the wakeword does not begin the command, the method comprising:
- receiving audio comprising speech;
buffering audio data representing the speech;
determining that the audio data includes a number of consecutive audio frames with an energy level below a threshold;
determining that a new utterance has occurred in the speech using a tone quality of the speech and the number of consecutive audio frames;
determining a first location in the speech corresponding to the new utterance;
storing the first location;
detecting a wakeword in the speech, wherein the wakeword corresponds to a second location in the speech, the second location being after the first location;
sending a portion of audio data to a remote server for speech processing, where a beginning of the portion of audio data corresponds to speech at the first location;
determining an end of the new utterance at a third location in the speech, the third location being after the second location;
concluding the sending of the portion of audio data to the remote server so that an end of the portion of audio data corresponds to speech at the third location;
receiving command data from the remote server; and
executing the command data.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for capturing and processing portions of a spoken utterance command that may occur before a wakeword. The system buffers incoming audio and indicates locations in the audio where the utterance changes, for example when a long pause is detected. When the system detects a wakeword within a particular utterance, the system determines the most recent utterance change location prior to the wakeword and sends the audio from that location to the end of the command utterance to a server for further speech processing.
-
Citations
20 Claims
-
1. A computer-implemented method for processing a spoken command when the wakeword does not begin the command, the method comprising:
-
receiving audio comprising speech; buffering audio data representing the speech; determining that the audio data includes a number of consecutive audio frames with an energy level below a threshold; determining that a new utterance has occurred in the speech using a tone quality of the speech and the number of consecutive audio frames; determining a first location in the speech corresponding to the new utterance; storing the first location; detecting a wakeword in the speech, wherein the wakeword corresponds to a second location in the speech, the second location being after the first location; sending a portion of audio data to a remote server for speech processing, where a beginning of the portion of audio data corresponds to speech at the first location; determining an end of the new utterance at a third location in the speech, the third location being after the second location; concluding the sending of the portion of audio data to the remote server so that an end of the portion of audio data corresponds to speech at the third location; receiving command data from the remote server; and executing the command data. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method comprising:
-
receiving audio comprising speech; storing audio data representing the speech in a non-transitory memory; determining a first location in the audio data associated with a change in a characteristic of the speech; determining a wakeword at a second location in the audio data; determining a speech endpoint at a third location in the audio data; determining a first portion of audio data, wherein the first portion of audio data begins proximate to the first location and ends proximate to the third location; and selecting the first portion of audio data for speech processing. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computing system comprising:
-
at least one processor; a memory including instructions operable to be executed by the at least one processor to cause the system to perform a set of actions comprising; receiving audio comprising speech; storing audio data representing the speech in a non-transitory memory; determining a first location in the audio data associated with a change in a characteristic of the speech; determining a wakeword at a second location in the audio data; determining a speech endpoint at a third location in the audio data; determining a first portion of audio data, wherein the first portion of audio data begins proximate to the first location and ends proximate to the third location; and selecting the first portion of audio data for speech processing. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification