Sub-matrix input for neural network layers

US 10,580,401 B2
Filed: 02/04/2015
Issued: 03/03/2020
Est. Priority Date: 01/27/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

storing, by a computing device, a spoken keyword detection model comprising a neural network;

generating, by the computing device, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance;

providing, by the computing device, each of the values in the set of input values to an input layer of the neural network;

providing, by the computing device, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network,wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, andwherein, for each of the nodes, the only values from the set of input values used to provide input to the node are the values in the non-overlapping proper subset for the node;

generating, by the computing device, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network;

based on the one or more outputs of the spoken keyword detection model, determining, by the computing device, that the utterance comprises a predetermined keyword; and

in response to determining that the utterance comprises the predetermined keyword, changing, by the computing device, an operating state of the computing device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. One of the methods includes generating, by a speech recognition system, a matrix from a predetermined quantity of vectors that each represent input for a layer of a neural network, generating a plurality of sub-matrices from the matrix, using, for each of the sub-matrices, the respective sub-matrix as input to a node in the layer of the neural network to determine whether an utterance encoded in an audio signal comprises a keyword for which the neural network is trained.

14 Citations

View as Search Results

19 Claims

1. A method comprising:
- storing, by a computing device, a spoken keyword detection model comprising a neural network;
  
  generating, by the computing device, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance;
  
  providing, by the computing device, each of the values in the set of input values to an input layer of the neural network;
  
  providing, by the computing device, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network,wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, andwherein, for each of the nodes, the only values from the set of input values used to provide input to the node are the values in the non-overlapping proper subset for the node;
  
  generating, by the computing device, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network;
  
  based on the one or more outputs of the spoken keyword detection model, determining, by the computing device, that the utterance comprises a predetermined keyword; and
  
  in response to determining that the utterance comprises the predetermined keyword, changing, by the computing device, an operating state of the computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, whereinthe set of input values includes values from a predetermined quantity of feature vectors that each model a portion of audio data encoding the utterance.
  - 3. The method of claim 1, wherein a size of each of the non-overlapping proper subsets is the same.
  - 4. The method of claim 1, wherein the set of input values includes values from a predetermined quantity of sequential vectors from an ordered series of vectors representing the utterance.
  - 5. The method of claim 1, wherein providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network comprises providing the non-overlapping proper subsets of the set of input values as input to a predetermined quantity of nodes in the particular layer of the neural network.
  - 6. The method of claim 1, wherein providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network comprises providing a non-overlapping proper subset of the set of input values as input to a plurality of adjacent nodes in the particular layer of the neural network.
  - 7. The method of claim 1, comprising:
    - wherein determining that the utterance comprises the predetermined keyword is performed using the output from the nodes in the neural network; and
      
      wherein the neural network has been trained to recognize the predetermined keyword.
  - 8. The method of claim 1, wherein changing the operating state of the computing device comprises exiting, by the computing device, a standby state.
  - 9. The method of claim 7, comprising, in response to determining that the utterance comprises the predetermined keyword, presenting, by the computing device, content to a user of the device.
  - 10. The method of claim 7, comprising, in response to determining that the utterance comprises the predetermined keyword, performing, by the computing device, an action for a particular application.
  - 11. The method of claim 10, wherein performing the action for the particular application comprises launching, by the computing device, the particular application.
  - 12. The method of claim 1, wherein the vectors each model a portion of audio data encoding the utterance;
    - wherein the set of input values comprises values that respectively correspond to different frequencies or frequency ranges of content of the audio data;
      
      wherein each of the non-overlapping proper subsets include only values that correspond to proper subset of the different frequencies or frequency ranges.

13. One or more non-transitory computer-readable media storing software comprising instructions executable by one or more computing devices which, upon such execution, cause the one or more computing devices to perform operations comprising:
- storing, by the one or more computing devices, a spoken keyword detection model comprising a neural network;
  
  generating, by the one or more computing devices, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance;
  
  providing, by the one or more computing devices, each of the values in the set of input values to an input layer of the neural network;
  
  providing, by the one or more computing devices, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network,wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, andwherein, for each of the nodes, the only values from the set of input values used to provide input to the are proper subset for the node;
  
  generating, by the one or more computing devices, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network;
  
  based on the one or more outputs of the spoken keyword detection model, determining, by the one or more computing devices, that the utterance comprises a predetermined keyword; and
  
  in response to determining that the utterance comprises the predetermined keyword, changing, by the one or more computing devices, an operating state of at least one of the one or more computing devices.
- View Dependent Claims (14, 15, 16)
- - 14. The one or more non-transitory computer-readable media of claim 13, wherein a size of each of the non-overlapping proper subsets is the same.
  - 15. The one or more non-transitory computer-readable media of claim 13, wherein the set of input values includes values from a predetermined quantity of sequential vectors.
  - 16. The one or more non-transitory computer-readable media of claim 13, wherein the set of input values comprises values that respectively correspond to different frequencies;
    - andwherein each of the non-overlapping proper subsets include values corresponding to only a proper subset of the different frequencies.

17. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computing devices, to cause the one or more computing devices to perform operations comprising;
  
  storing, by the one or more computing devices, a spoken keyword detection model comprising a neural network;
  
  generating, by the one or more computing devices, a set of input values for the neural network of the spoken keyword detection model, wherein the set of input values includes values from a predetermined quantity of vectors that indicate acoustic characteristics of an utterance;
  
  providing, by the one or more computing devices, each of the values in the set of input values to an input layer of the neural network;
  
  providing, by the one or more computing devices, non-overlapping proper subsets of the set of input values as input to nodes in a particular layer of the neural network,wherein each of the nodes receives a different non-overlapping proper subset of the set of input values, each of the non-overlapping proper subsets comprising multiple values from the set of input values, andwherein, for each of the nodes, the only values from the set of input values used to provide input to the node are the values in the non-overlapping proper subset for the node;
  
  generating, by the one or more computing devices, one or more outputs of the spoken keyword detection model, wherein the one or more outputs of the spoken keyword detection model are based on output of the neural network that was generated in response to providing the non-overlapping proper subsets of the set of input values as input to the nodes in the particular layer of the neural network;
  
  based on the one or more outputs of the spoken keyword detection model, determining, by the one or more computing devices, that the utterance comprises a predetermined keyword; and
  
  in response to determining that the utterance comprises the predetermined keyword, changing, by the one or more computing devices, an operating state of at least one of the one or more computing devices.
- View Dependent Claims (18, 19)
- - 18. The system of claim 17, wherein generating the set of input values comprises stacking the predetermined quantity of vectors to generate the set of input values.
  - 19. The system of claim 17, wherein each value in the set of input values is included in at least one of the non-overlapping proper subsets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Lopez Moreno, Ignacio, Chen, Yu-hsin Joyce
Primary Examiner(s)
Zhen, Li B.
Assistant Examiner(s)
Chubb, Mikayla

Application Number

US14/613,493
Publication Number

US 20160217367A1
Time in Patent Office

1,854 Days
Field of Search

None
US Class Current
CPC Class Codes

G06N 3/045   Combinations of networks

G10L 15/16   using artificial neural net...

G10L 17/18   Artificial neural networks;...

G10L 2015/088   Word spotting

Sub-matrix input for neural network layers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Sub-matrix input for neural network layers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links