Dynamic time warping using a digital signal processor
First Claim
1. A method for performing automatic voice recognition comprising the steps of:
- storing in a reference memory a plurality of reference speech pattern templates each comprising a plurality of reference time frames having sets of acoustic features signals and each representative of a prescribed spoken reference speech pattern;
analyzing by a feature extraction unit a speech utterance to determine a plurality of unknown time frames to obtain sets of acoustic feature signals;
initially storing in an unknown word memory and said sets of acoustic feature signals of said plurality of unknown time frames;
obtaining a final accumulated correspondence signal for one of said templates by a pattern matcher performing the substeps of(1) forming an initial sequence of accumulated correspondence signals in response to the sets of acoustic feature signals of one of said templates and one set of acoustic feature signals of a first one of said plurality of unknown time frames;
(2) storing in said unknown word memory said initial sequence in a sequential group of memory locations in descending chronological order;
(3) determining a local correspondence signal corresponding to an individual memory location in said sequential group of memory locations from said sets of acoustic features for said plurality of unknown time frames and said one of said templates;
(4) calculating a accumulated correspondence signal of the sequence of accumulated correspondence signals corresponding to said individual memory location in said group of memory locations for a second one of said plurality of unknown time frames by adding said local correspondence signal to the minimum of the contents of said individual memory location and absolute values of first and second memory locations following said individual memory location;
(5) storing the negative of the calculated accumulated correspondence signal into said first individual memory location upon said contents of said individual memory location being the minimum and otherwise storing said calculated accumulated correspondence signal into said individual memory location;
(6) repeating substeps 3 through 5 to obtain the accumulated correspondence signals in descending order for each memory location of said group of memory locations;
(7) repeating substeps 3 through 6 to obtain the final accumulated correspondence signal of the sequence of said accumulated correspondence signals for the remaining ones of said plurality of said unknown time frames;
obtaining a final accumulated correspondence signal for each of the remaining ones of said templates by said pattern matcher performing substeps 1 through 7 for each of the remaining ones of said templates; and
indicating by a decision unit in response to the final accumulated correspondence signals the speech pattern represented by said speech utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A digital signal processor implementation of dynamic time warping for automatic speech recognition using a single memory array that stores only one sequence of accumulated correspondence values and detecting whether or not the constraint on horizontal path compression has been exceeded by storing negatives of calculated accumulated correspondence values to indicate horizontal path movement. The accumulated correspondence values are stored in the array in descending order; and memory locations of that array are reused as accumulated correspondence values are calculated representing the correspondence between a plurality of reference time frames and a plurality of unknown time frames representing an unknown word or utterance. When path movement is from an adjacent horizontal correspondence node to the present node, the negative of the calculated accumulated correspondence is written back into a memory location associated with the present node. When the accumulated correspondence associated with that memory location is calculated for the next sequence of accumulated correspondence values, the method detects the previously stored negative value and does not allow the path to move again in the horizontal direction.
-
Citations
4 Claims
-
1. A method for performing automatic voice recognition comprising the steps of:
-
storing in a reference memory a plurality of reference speech pattern templates each comprising a plurality of reference time frames having sets of acoustic features signals and each representative of a prescribed spoken reference speech pattern; analyzing by a feature extraction unit a speech utterance to determine a plurality of unknown time frames to obtain sets of acoustic feature signals; initially storing in an unknown word memory and said sets of acoustic feature signals of said plurality of unknown time frames; obtaining a final accumulated correspondence signal for one of said templates by a pattern matcher performing the substeps of (1) forming an initial sequence of accumulated correspondence signals in response to the sets of acoustic feature signals of one of said templates and one set of acoustic feature signals of a first one of said plurality of unknown time frames; (2) storing in said unknown word memory said initial sequence in a sequential group of memory locations in descending chronological order; (3) determining a local correspondence signal corresponding to an individual memory location in said sequential group of memory locations from said sets of acoustic features for said plurality of unknown time frames and said one of said templates; (4) calculating a accumulated correspondence signal of the sequence of accumulated correspondence signals corresponding to said individual memory location in said group of memory locations for a second one of said plurality of unknown time frames by adding said local correspondence signal to the minimum of the contents of said individual memory location and absolute values of first and second memory locations following said individual memory location; (5) storing the negative of the calculated accumulated correspondence signal into said first individual memory location upon said contents of said individual memory location being the minimum and otherwise storing said calculated accumulated correspondence signal into said individual memory location; (6) repeating substeps 3 through 5 to obtain the accumulated correspondence signals in descending order for each memory location of said group of memory locations; (7) repeating substeps 3 through 6 to obtain the final accumulated correspondence signal of the sequence of said accumulated correspondence signals for the remaining ones of said plurality of said unknown time frames; obtaining a final accumulated correspondence signal for each of the remaining ones of said templates by said pattern matcher performing substeps 1 through 7 for each of the remaining ones of said templates; and indicating by a decision unit in response to the final accumulated correspondence signals the speech pattern represented by said speech utterance.
-
-
2. A method for performing automatic voice recognition comprising the steps of:
-
initially storing in a reference memory a plurality of speech patter templates each comprising a plurality of reference time frames having sets of acoustic feature signals and each representative of a prescribed spoken reference speech pattern; analyzing by a feature extraction unit a speech utterance to determine a plurality of unknown time frames to obtain sets of acoustic feature signals; storing in an unknown word memory said sets of feature signals of said unknown time frames; obtaining a final accumulated correspondence signal for one of said templates by a pattern matcher performing the substeps of; (1) forming an initial sequence of accumulated correspondence signals in response to the sets of acoustic feature signals of one of said templates and one set of acoustic feature signals of a first one of said plurality of unknown time frames; (2) storing said initial sequence in a sequential group of memory locations in descending chronological order with respect to said unknown time frames; (3) saving in a temporary storage the absolute value of the contents of the first memory location following the memory location corresponding to each of a said accumulated correspondence signals of said next sequence of said accumulated correspondence signals; (4) setting a flag upon said absolute value of said contents of the first memory location being greater than the absolute value of said corresponding memory location and said value of said corresponding memory location being positive; (5) conditionally saving in said temporary storage the negative of the value of said corresponding memory location upon said flat being set thereby replacing the saved absolute value; (6) calculating a local correspondence signal corresponding to each of said accumulated correspondence signal of said next sequence of said accumulated correspondence signal from sets of acoustic features for said first one of said plurality of unknown time frames and said plurality reference time frames; (7) setting said flag upon the absolute value of the contents of second memory location following the corresponding memory location being less than the absolute value of the saved value in said temporary storage; (8) conditionally saving in said temporary storage said absolute value of said contents of said second memory location upon said flag being set thereby replacing the saved absolute value; (9) summing the calculated local correspondence signal and said saved absolute value in said temporary storage to determine the accumulated correspondence signal of said sequence; (10) writing said accumulated correspondence signal into said corresponding memory location; (11) repeating substeps 3 through 10 to obtain the final accumulated correspondence signal for the remaining ones of said plurality of unknown time frames; obtaining a final accumulated correspondence signal for each of the remaining ones of said templates by said pattern matcher performing substeps 1 through 11 for each of the remaining ones of said templates; and indicating by a decision unit in response to the final accumulated correspondence signals the speech pattern represented by said speech utterance. - View Dependent Claims (3, 4)
-
Specification