Sub-matrix input for neural network layers
First Claim
Patent Images
1. A method comprising:
- storing, by a computing device, a spoken keyword detection model comprising a neural network;
generating, by the computing device, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance;
providing, by the computing device, each of the values in the set of input values to an input layer of the neural network;
providing, by the computing device, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network,wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, andwherein, for each of the nodes, the only values from the set of input values used to provide input to the node are the values in the non-overlapping proper subset for the node;
generating, by the computing device, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network;
based on the one or more outputs of the spoken keyword detection model, determining, by the computing device, that the utterance comprises a predetermined keyword; and
in response to determining that the utterance comprises the predetermined keyword, changing, by the computing device, an operating state of the computing device.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes generating, by a speech recognition system, a matrix from a predetermined quantity of vectors that each represent input for a layer of a neural network, generating a plurality of sub-matrices from the matrix, using, for each of the sub-matrices, the respective sub-matrix as input to a node in the layer of the neural network to determine whether an utterance encoded in an audio signal comprises a keyword for which the neural network is trained.
14 Citations
19 Claims
-
1. A method comprising:
-
storing, by a computing device, a spoken keyword detection model comprising a neural network; generating, by the computing device, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance; providing, by the computing device, each of the values in the set of input values to an input layer of the neural network; providing, by the computing device, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network, wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, and wherein, for each of the nodes, the only values from the set of input values used to provide input to the node are the values in the non-overlapping proper subset for the node; generating, by the computing device, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network; based on the one or more outputs of the spoken keyword detection model, determining, by the computing device, that the utterance comprises a predetermined keyword; and in response to determining that the utterance comprises the predetermined keyword, changing, by the computing device, an operating state of the computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. One or more non-transitory computer-readable media storing software comprising instructions executable by one or more computing devices which, upon such execution, cause the one or more computing devices to perform operations comprising:
-
storing, by the one or more computing devices, a spoken keyword detection model comprising a neural network; generating, by the one or more computing devices, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance; providing, by the one or more computing devices, each of the values in the set of input values to an input layer of the neural network; providing, by the one or more computing devices, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network, wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, and wherein, for each of the nodes, the only values from the set of input values used to provide input to the are proper subset for the node; generating, by the one or more computing devices, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network; based on the one or more outputs of the spoken keyword detection model, determining, by the one or more computing devices, that the utterance comprises a predetermined keyword; and in response to determining that the utterance comprises the predetermined keyword, changing, by the one or more computing devices, an operating state of at least one of the one or more computing devices. - View Dependent Claims (14, 15, 16)
-
-
17. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computing devices, to cause the one or more computing devices to perform operations comprising; storing, by the one or more computing devices, a spoken keyword detection model comprising a neural network; generating, by the one or more computing devices, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance; providing, by the one or more computing devices, each of the values in the set of input values to an input layer of the neural network; providing, by the one or more computing devices, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network, wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, and wherein, for each of the nodes, the only values from the set of input values used to provide input to the node are the values in the non-overlapping proper subset for the node; generating, by the one or more computing devices, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network; based on the one or more outputs of the spoken keyword detection model, determining, by the one or more computing devices, that the utterance comprises a predetermined keyword; and in response to determining that the utterance comprises the predetermined keyword, changing, by the one or more computing devices, an operating state of at least one of the one or more computing devices. - View Dependent Claims (18, 19)
-
Specification