Time series association learning
First Claim
1. A method for recognizing human speech, comprising the steps of:
- forming a series of acoustic waveforms functionally related to a concomitant set of uttered token human speech sounds;
forming a series of digital articulation representations functionally related to selected articulatory movement sets of a speaker uttering said token human speech sounds;
windowing each of said acoustic waveforms to form a time framed first digital speech signal series;
temporally aligning said digital articulation representations with said time framed first digital signal series;
inputting said speech signal series and said digital articulation representation to a neural network to form a learned relationship between said speech signal series as an input to said neural network and said digital articulation representation as an output from said neural network;
deriving selected acoustic features from each said time framed first digital speech signal;
associating said selected acoustic features with said selected articulatory movement sets to form a template parameter set that uniquely associates each one of said series of acoustic waveforms with each one of said selected articulatory movement sets;
forming from said human speech a second digital speech signal series;
inputting said second digital speech signal series to said neural network having said learned relationship for outputting a learned articulatory parameter set; and
comparing each said learned articulatory parameter set with each one of said template parameter sets to select one template parameter set having the best match with said learned articulatory parameter set.
1 Assignment
0 Petitions
Accused Products
Abstract
An acoustic input is recognized from inferred articulatory movements output by a learned relationship between training acoustic waveforms and articulatory movements. The inferred movements are compared with template patterns prepared from training movements when the relationship was learned to regenerate an acoustic recognition. In a preferred embodiment, the acoustic articulatory relationships are learned by a neural network. Subsequent input acoustic patterns then generate the inferred articulatory movements for use with the templates. Articulatory movement data may be supplemented with characteristic acoustic information, e.g. relative power and high frequency data, to improve template recognition.
-
Citations
1 Claim
-
1. A method for recognizing human speech, comprising the steps of:
-
forming a series of acoustic waveforms functionally related to a concomitant set of uttered token human speech sounds; forming a series of digital articulation representations functionally related to selected articulatory movement sets of a speaker uttering said token human speech sounds; windowing each of said acoustic waveforms to form a time framed first digital speech signal series; temporally aligning said digital articulation representations with said time framed first digital signal series; inputting said speech signal series and said digital articulation representation to a neural network to form a learned relationship between said speech signal series as an input to said neural network and said digital articulation representation as an output from said neural network; deriving selected acoustic features from each said time framed first digital speech signal; associating said selected acoustic features with said selected articulatory movement sets to form a template parameter set that uniquely associates each one of said series of acoustic waveforms with each one of said selected articulatory movement sets; forming from said human speech a second digital speech signal series; inputting said second digital speech signal series to said neural network having said learned relationship for outputting a learned articulatory parameter set; and comparing each said learned articulatory parameter set with each one of said template parameter sets to select one template parameter set having the best match with said learned articulatory parameter set.
-
Specification