Method of optimising the execution of a neural network in a speech recognition system through conditionally skipping a variable number of frames
First Claim
1. A method of executing a neural network in a speech recognition system for recognizing speech of an input speech signal organized into a series of frames, comprising:
- calculating, by means of said neural network, a first and a second likelihood corresponding to a first and a second non-consecutive frame;
calculating a distance between said first and second non-consecutive frames;
comparing said distance with a predetermined threshold value to evaluate a possibility of skipping at least one run of the neural network;
selectively skipping the at least one run of the neural network in correspondence to each frame between said first and said second non-consecutive frames to optimize the neural network when said distance is lower than said threshold value;
calculating a likelihood or likelihoods corresponding to each frame between said first and second non-consecutive frames;
calculating said distance as a distance between output likelihoods of said neural network; and
providing an optimized neural network by outputting the likelihood or likelihoods corresponding to each frame between said first and second non-consecutive frames to a computer readable medium.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of optimizing the execution of a neural network in a speech recognition system provides for conditionally skipping a variable number of frames, depending on a distance computed between output probabilities, or likelihoods, of a neural network. The distance is initially evaluated between two frames at times 1 and 1+k, where k is a predetermined maximum distance between frames, and if such distance is sufficiently small, the frames between times 1 and 1+k are calculated by interpolation, avoiding further executions of the neural network. If, on the contrary, such distance is not small enough, it means that the outputs of the network are changing quickly, and it is not possible to skip too many frames. In that case, the method attempts to skip remaining frames, calculating and evaluating a new distance.
-
Citations
20 Claims
-
1. A method of executing a neural network in a speech recognition system for recognizing speech of an input speech signal organized into a series of frames, comprising:
-
calculating, by means of said neural network, a first and a second likelihood corresponding to a first and a second non-consecutive frame; calculating a distance between said first and second non-consecutive frames; comparing said distance with a predetermined threshold value to evaluate a possibility of skipping at least one run of the neural network; selectively skipping the at least one run of the neural network in correspondence to each frame between said first and said second non-consecutive frames to optimize the neural network when said distance is lower than said threshold value; calculating a likelihood or likelihoods corresponding to each frame between said first and second non-consecutive frames; calculating said distance as a distance between output likelihoods of said neural network; and providing an optimized neural network by outputting the likelihood or likelihoods corresponding to each frame between said first and second non-consecutive frames to a computer readable medium. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A speech recognition system for recognizing speech of an input speech signal, comprising:
-
a neural network for calculating likelihoods corresponding to frames of said input speech signal, comprising; a buffer for storing a plurality of input frames; a distance evaluation unit for calculating a distance between a first and a second likelihood, said first and second likelihoods being obtained by means of said neural network and corresponding to a first and a second non-consecutive buffered frames; a comparing unit for comparing said distance with a predetermined threshold value to evaluate a possibility of skipping at least one run of the neural network; and an interpolation unit for, after the comparing, in case said distance is lower than said threshold value, skipping a run of the neural network corresponding to each of the frame or frames between said first and second non-consecutive frames to optimize the neural network, and calculating the likelihood or likelihoods corresponding to the frame or frames between said first and second non-consecutive buffered frames, and a computer readable medium for storing at least one output of the neural network, the at least one output comprising the likelihood or likelihoods corresponding to the frame or frames between said first and second non-consecutive buffered frames. - View Dependent Claims (11)
-
-
12. A non-transitory computer-readable medium for use on a computing system, the non-transitory computer-readable medium including computer-executable instructions for performing a method of executing a neural network in a speech recognition system for recognizing speech of an input speech signal organized into a series of frames, the method comprising:
-
calculating, by means of said neural network, a first and a second likelihood corresponding to a first and a second non-consecutive frame; calculating a distance between said first and second non-consecutive frames; comparing said distance with a predetermined threshold value to evaluate a possibility of skipping at least one run of the neural network; selectively skipping the at least one run of the neural network in correspondence to each frame between said first and said second non-consecutive frames to optimize the neural network when said distance is lower than said threshold value; calculating a likelihood or likelihoods corresponding to each frame between said first and second non-consecutive frames; calculating said distance as a distance between output likelihoods of said neural network; and providing an optimized neural network by outputting the likelihood or likelihoods corresponding to each frame between said first and second non-consecutive frames to the computer readable medium. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification