Method and apparatus for word counting in continuous speech recognition useful for reliable barge-in and early end of speech detection
First Claim
Patent Images
1. A method comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;
b. obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
c. extracting features from the speech frame;
d. performing dynamic programming to build a speech recognition network;
e. performing a beam search using the speech recognition network;
f. updating a decoding tree of the speech utterance after the beam search;
g. determining if a first word of the speech utterance has been received and if it has been received disabling any aural prompt and continuing to step h, otherwise if first has not been determined continuing to step h;
h. determining if n words have been received and if n words have not been received then returning to step b, otherwise continuing to step i;
i. backtracking through the beam search path having the greatest likelihood score to obtain a string having a greatest likelihood of corresponding to the received utterance when speech recognition of the word sequence has completed; and
j. outputting the string.
0 Assignments
0 Petitions
Accused Products
Abstract
Speech recognition technology has attained maturity such that the most likely speech recognition result has been reached and is available before an energy based termination of speech has been made. The present invention innovatively uses the rapidly available speech recognition results to provide intelligent barge-in for voice-response systems, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence, and to count words to provide rapid, speech recognition based termination of speech processing and outputting of the recognized word sequence.
-
Citations
12 Claims
-
1. A method comprising the steps of:
-
a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;
b. obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
c. extracting features from the speech frame;
d. performing dynamic programming to build a speech recognition network;
e. performing a beam search using the speech recognition network;
f. updating a decoding tree of the speech utterance after the beam search;
g. determining if a first word of the speech utterance has been received and if it has been received disabling any aural prompt and continuing to step h, otherwise if first has not been determined continuing to step h;
h. determining if n words have been received and if n words have not been received then returning to step b, otherwise continuing to step i;
i. backtracking through the beam search path having the greatest likelihood score to obtain a string having a greatest likelihood of corresponding to the received utterance when speech recognition of the word sequence has completed; and
j. outputting the string. - View Dependent Claims (2, 3)
in parallel with step h, determining if a low energy gap time has been reached in a sequence of frames, and if such a gap time has not been reached returning to step b, and if such a gap time has been reached continuing to step i.
-
-
4. A method for speech recognition of comprising the steps of:
-
a. determining if a speech utterance has started, if an utterance has not started then returning to the beginning of step a, otherwise continuing to step b;
b. getting a speech frame that represents a frame period that is next in time;
c. extracting features from the speech frame;
d. using the features extracted from the present speech frame to score word models of a speech recognition grammar;
e. dynamically programming an active network of word sequences using a Viterbi algorithm;
f. pruning unlikely words and extending likely words to update the active network;
g. updating a decoding tree;
h. determining a word count n for this speech frame of the speech utterance;
i. examining the word count n and if the word count n is equal to one disabling any aural prompt and continuing with step b, if the word count n is greater than one but less than a termination count N continuing with step j; and
if the word count n is at least equal to the termination count N continuing with step k;
j. determining if n words have been determined as recognized by each of the word counts and if n words have not been determined as recognized then returning to step b and if n words have been recognized outputting the n words and returning to step b, otherwise continuing to step i;
k. determining if the end of the utterance has been reached by determining if the word count of each of the presently active word sequences is equal to the same termination count N and if each of the word counts of the presently active word sequences is equal to N then declaring the utterance ended and continuing to step m, otherwise continuing to step 1;
l. determining if there has not been any speech energy for a pre-specified gap time and if there has not been any then declaring the utterance ended and continuing to step m, otherwise returning to step b;
m. backtracking through the various active word sequences to obtain the word sequence with the greatest likelihood of matching the utterance; and
n. outputting the word sequence corresponding to the greatest likelihood. - View Dependent Claims (5, 6, 7, 8)
examining all viable word sequences contained in the decoding tree for the present speech frame;
traversing through pointers that are associated with non-silence nodes of the decoding tree; and
counting a number of words of all the viable word sequences.
-
-
6. The method of claim 4, wherein said said word sequence must be found in a pre-specified grammar.
-
7. The method of claim 4, further comprising the step of:
after step j, determining if the partial word sequence corresponds to a word sequence requiring a different maximum word count, and if a different maximum word count is required adjusting the maximum word count N to the different maximum word count.
-
8. The method of claim 7, wherein the partial word sequence requiring a different maximum word count is a telephone number prefix.
-
9. An apparatus for speech recognition of a speech utterance comprising:
-
means for determining if the speech utterance has started, means responsive to said speech utterance start determining means for obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
means for extracting features from said speech frame;
means for building a speech recognition network using dynamic programming;
means for performing a beam search using the speech recognition network;
means for updating a decoding tree of the speech utterance after the beam search;
means for determining if a first word of the speech utterance has been received and if it has been received disabling any aural prompt;
means for determining if N words have been received to quickly end further speech recognition processing of the speech utterance;
means responsive to said N word determining means for backtracking through the beam search path having the greatest likelihood score to obtain a word sequence having a greatest likelihood of corresponding to the received speech utterance ; and
means for outputting said word sequence. - View Dependent Claims (10)
-
-
11. A method for use with an interactive speech recognition system having an aural prompt, comprising the steps of:
a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;
-
12. A method for use with an interactive speech recognition system having an aural prompt, comprising the steps of:
a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;
Specification