Method and system of neural network keyphrase detection
First Claim
Patent Images
1. A keyphrase detection device comprising:
- memory storing received audio input; and
at least one neural network accelerator communicatively connected to the memory to receive the audio input and to operate by;
generating a multiple element acoustic score vector for a current time instance based on the received audio input, wherein the multiple element acoustic score vector comprises at least one acoustic score for at least one rejection model and acoustic scores for at least one multiple state keyphrase model, and wherein the multiple state keyphrase model corresponds to a predetermined keyphrase,recursively aligning elements of a previous multiple element state score vector for a previous time instance with the elements of the multiple element acoustic score vector, wherein the previous multiple element state score vector comprises at least one previous state score for the single state rejection model and previous state scores for the multiple state keyphrase model,generating an intermediate score vector by using scores from both the multiple element acoustic score vector and the previous multiple element state score vector,generating a current multiple element state score vector of a current time instance comprising performing a propagating operation with the intermediate score vector, anddetermining whether the received audio input is associated with the predetermined keyphrase by using scores from the current multiple element state score vector; and
a controller to provide at least one command to perform an action when the received audio input is associated with the predetermined keyphrase.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system are directed to autonomous neural network keyphrase detection and includes generating and using a multiple element state score vector by using neural network operations and without substantial use of a digital signal processor (DSP) to perform the keyphrase detection.
-
Citations
25 Claims
-
1. A keyphrase detection device comprising:
-
memory storing received audio input; and at least one neural network accelerator communicatively connected to the memory to receive the audio input and to operate by; generating a multiple element acoustic score vector for a current time instance based on the received audio input, wherein the multiple element acoustic score vector comprises at least one acoustic score for at least one rejection model and acoustic scores for at least one multiple state keyphrase model, and wherein the multiple state keyphrase model corresponds to a predetermined keyphrase, recursively aligning elements of a previous multiple element state score vector for a previous time instance with the elements of the multiple element acoustic score vector, wherein the previous multiple element state score vector comprises at least one previous state score for the single state rejection model and previous state scores for the multiple state keyphrase model, generating an intermediate score vector by using scores from both the multiple element acoustic score vector and the previous multiple element state score vector, generating a current multiple element state score vector of a current time instance comprising performing a propagating operation with the intermediate score vector, and determining whether the received audio input is associated with the predetermined keyphrase by using scores from the current multiple element state score vector; and a controller to provide at least one command to perform an action when the received audio input is associated with the predetermined keyphrase. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented system for performing keyphrase detection comprising:
-
at least one microphone to capture audio input; memory to store the audio input; at least one processor communicatively coupled to the at least one microphone and at least one memory, and comprising at least one neural network accelerator to receive the audio input and operate by; generating a multiple element acoustic score vector for a current time instance based on received audio input, wherein the multiple element acoustic score vector comprises at least one acoustic score for at least one rejection model and acoustic scores for at least one multiple state keyphrase model, and wherein the multiple state keyphrase model corresponds to a predetermined keyphrase; recursively aligning elements of a previous multiple element state score vector for a previous time instance with the elements of the multiple element acoustic score vector, wherein the previous multiple element state score vector comprises at least one previous state score for the single state rejection model and previous state scores for the multiple state keyphrase model; generating an intermediate score vector by using scores from both the multiple element acoustic score vector and the previous multiple element state score vector, generating a current multiple element state score vector of a current time instance comprising performing a propagating operation with the intermediate score vector, and determining whether current state scores of the current multiple element state score vector indicate that the received audio input is associated with the predetermined keyphrase; and providing at least one command to perform an action when the received audio input is associated with the predetermined keyphrase. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. At least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to perform keyphrase detection by:
-
generating, by a neural network accelerator, a multiple element acoustic score vector for a current time instance based on received audio input, wherein the multiple element acoustic score vector comprises at least one acoustic score for at least one rejection model and acoustic scores for at least one multiple state keyphrase model, and wherein the multiple state keyphrase model corresponds to a predetermined keyphrase; recursively aligning elements of a previous multiple element state score vector for a previous time instance with the elements of the multiple element acoustic score vector, wherein the previous multiple element state score vector comprises at least one previous state score for the single state rejection model and previous state scores for the multiple state keyphrase model; generating, by a neural network accelerator, an intermediate score vector by using scores from both the multiple element acoustic score vector and the previous multiple element state score vector; generating, by a neural network accelerator, a current multiple element state score vector of a current time instance comprising performing a propagating operation with the intermediate score vector; determining, by a neural network accelerator, whether current state scores of the current multiple element state score vector indicate that the received audio input is associated with the predetermined keyphrase; and providing at least one command to perform an action when the received audio input is associated with the predetermined keyphrase. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
Specification