Frame skipping with extrapolation and outputs on demand neural network for automatic speech recognition
First Claim
Patent Images
1. A computer-implemented method for providing automatic speech recognition comprising:
- receiving, by a microphone, speech for evaluation;
converting the received speech to a speech recording;
extracting features from the speech recording;
evaluating, for a first time instance, a neural network based on the extracted features to determine a first distance value associated with the first time instance, wherein the first distance value corresponds to an output node from the neural network;
evaluating, for a second time instance, the neural network based on the extracted features to determine a second distance value associated with the second time instance, wherein the second distance value corresponds to the output node from the neural network;
approximating, for a third time instance, a third distance value based on at least one of an extrapolation or an interpolation of the first and second distance values, wherein the neural network is not evaluated for the third time instance;
converting the speech recording to a recognized word sequence based on a plurality of distance values comprising the first, the second, and the third distance values; and
storing the recognized word sequence in a system memory.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include implementing frame skipping with approximated skip frames and/or distances on demand such that only those outputs needed by a speech decoder are provided via the neural network or approximation techniques.
-
Citations
21 Claims
-
1. A computer-implemented method for providing automatic speech recognition comprising:
-
receiving, by a microphone, speech for evaluation; converting the received speech to a speech recording; extracting features from the speech recording; evaluating, for a first time instance, a neural network based on the extracted features to determine a first distance value associated with the first time instance, wherein the first distance value corresponds to an output node from the neural network; evaluating, for a second time instance, the neural network based on the extracted features to determine a second distance value associated with the second time instance, wherein the second distance value corresponds to the output node from the neural network; approximating, for a third time instance, a third distance value based on at least one of an extrapolation or an interpolation of the first and second distance values, wherein the neural network is not evaluated for the third time instance; converting the speech recording to a recognized word sequence based on a plurality of distance values comprising the first, the second, and the third distance values; and storing the recognized word sequence in a system memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for providing automatic speech recognition comprising:
-
a microphone to receive speech and convert the received speech to a digital signal; a system memory configured to store a speech recording corresponding to the digital signal; and a central processing unit coupled to the memory, the central processing unit to extract features from the speech recording, to implement, for a first time instance, a neural network based on the extracted features to determine a first distance value associated with the first time instance, wherein the first distance value corresponds to an output node from the neural network, to implement, for a second time instance, the neural network based on the extracted features to determine a second distance value associated with the second time instance, wherein the second distance value corresponds to the output node from the neural network, to approximate, for a third time instance, a third distance value based on at least one of an extrapolation or an interpolation of the first and second distance values, and to convert decode the speech recording to a recognized word sequence based on a plurality of distance values comprising the first, the second, and the third distance values, and to determine a recognized word sequence corresponding to the speech recording to store the recognized word sequence in the system memory. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. At least one non-transitory machine readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to perform a method comprising:
-
receiving, by a microphone, speech for evaluation; converting the received speech to a speech recording; extracting features from the speech recording; evaluating, for a first time instance, a neural network based on the extracted features to determine a first distance value associated with the first time instance, wherein the first distance value corresponds to an output node from the neural network; evaluating, for a second time instance, the neural network based on the extracted features to determine a second distance value associated with the second time instance, wherein the second distance value corresponds to the output node from the neural network; approximating, for a third time instance, a third distance value based on at least one of an extrapolation or an interpolation of the first and second distance values, wherein the neural network is not evaluated for the third time instance; converting the speech recording to a recognized word sequence based on a plurality of distance values comprising the first, the second, and the third distance values; and storing the recognized word sequence in a system memory. - View Dependent Claims (18, 19, 20, 21)
-
Specification