Speaker-independent word recognizer

US 4,712,242 A
Filed: 04/13/1983
Issued: 12/08/1987
Est. Priority Date: 04/13/1983
Status: Expired due to Fees

First Claim

Patent Images

1. A word recognition system for identifying a spoken word represented by an analog speech signal, said word recognition system comprising:

signal processing means for receiving an analog input speech signal and for producing feature vectors from the input speech signal to provide a sequence of feature vectors at predetermined speech frame intervals as an output therefrom;

memory means storing a plurality of reference templates of digital speech data respectively representative of individual words and comprising the vocabulary of the word recognition system, each of said reference templates being defined by a predetermined plurality of reference vectors arranged in a predetermined sequence and comprising an acoustic description of an individual word in a time-ordered sequence, each of said reference templates being further defined by at least one mask vector respectively associated with each said sequence of reference templates being further defined by at least one mask vector respectively associated with each said sequence of reference vectors and being indicative of the significance of portions of the reference vector sequence association therewith in establishing the identity of the word represented by the reference template of which said at least one mask vector is a component;

means operably associated with said signal processing means for comparing each feature vector of said input speech signal with the corresponding reference vectors of each of said reference templates to provide a distance measure with respect to each of the feature vectors and the predetermined reference vector sequences defining acoustic descriptions of the respective words included in the vocabulary of the word recognition system, said comparing means being responsive to the status of the respective mask vectors comprising components of said plurality of reference templates to ignore elements of reference vectors included in respective reference templates which are indicated by the associated mask vector to be insignificant so as to provide said distance measure based upon significant elements of the reference vectors as included in the predetermined reference vector sequences; and

word recognizing means operably associated with said comparing means for determining which one of the plurality of the reference templates is the closest match to said input speech signal based upon the distance measures of said reference vector sequences and successively received feature vectors corresponding to respective speech frames.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speaker-independent word recognition is performed, based on a small acoustically distinct vocabulary, with minimal hardware requirements. After a simple preconditioning filter, the zero crossing intervals of the input speech are measured and sorted by duration, to provide a rough measure of the frequency distribution within each input frame. The distribution of zero crossing intervals is transformed into a binary feature vector, which is compared with each reference template using a modified Hamming distance measure. A dynamic time warping algorithm is used to permit recognition of various speaker rates, and to economize on the reference template storage requirements. A mask vector with each reference vector on a template is used to ignore insignificant (or speaker-dependent) features of the words detected.

45 Citations

View as Search Results

16 Claims

1. A word recognition system for identifying a spoken word represented by an analog speech signal, said word recognition system comprising:
- signal processing means for receiving an analog input speech signal and for producing feature vectors from the input speech signal to provide a sequence of feature vectors at predetermined speech frame intervals as an output therefrom;
  
  memory means storing a plurality of reference templates of digital speech data respectively representative of individual words and comprising the vocabulary of the word recognition system, each of said reference templates being defined by a predetermined plurality of reference vectors arranged in a predetermined sequence and comprising an acoustic description of an individual word in a time-ordered sequence, each of said reference templates being further defined by at least one mask vector respectively associated with each said sequence of reference templates being further defined by at least one mask vector respectively associated with each said sequence of reference vectors and being indicative of the significance of portions of the reference vector sequence association therewith in establishing the identity of the word represented by the reference template of which said at least one mask vector is a component;
  
  means operably associated with said signal processing means for comparing each feature vector of said input speech signal with the corresponding reference vectors of each of said reference templates to provide a distance measure with respect to each of the feature vectors and the predetermined reference vector sequences defining acoustic descriptions of the respective words included in the vocabulary of the word recognition system, said comparing means being responsive to the status of the respective mask vectors comprising components of said plurality of reference templates to ignore elements of reference vectors included in respective reference templates which are indicated by the associated mask vector to be insignificant so as to provide said distance measure based upon significant elements of the reference vectors as included in the predetermined reference vector sequences; and
  
  word recognizing means operably associated with said comparing means for determining which one of the plurality of the reference templates is the closest match to said input speech signal based upon the distance measures of said reference vector sequences and successively received feature vectors corresponding to respective speech frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 16)
- - 2. A word recognition system as set forth in claim 1, wherein said feature vectors are binary, and wherein said distance measure corresponds to a Hamming distance measure between said feature vector and a reference vector as modified in accordance with said at least one mask vector associated with the reference vector sequence of which said reference vector is a component.
  - 3. A word recognition system as set forth in claim 1, wherein one mask vector is uniquely associated with each of said reference vectors such that each said reference template comprises a plurality of mask vectors corresponding in number to said predetermined plurality of reference vectors arranged in said predetermined sequence.
  - 4. A word recognition system as set forth in claim 3, wherein each said mask vector is a binary vector comprising a plurality of bits, and each said reference vector is a binary vector comprising a plurality of bits, and each said mask vector has the same number of bits as the reference vector associated therewith, each of said reference templates representative of a word thereby comprising a sequence of pairs of binary vectors including a reference vector and a mask vector in each said pair of binary vectors.
  - 5. A word recognitio system as set forth in claim 1, wherein each of said mask vectors is a binary vector comprising a plurality of bits, each of said bits of said mask vector selectively assuming alternative predetermined values respectively indicating that a corresponding portion of said reference vector sequence with which said mask vector is associated is either significant or insignificant in the determination of said distance measure in establishing the identity of the word represented by the reference template to which the reference vector sequence corresponds.
  - 6. A word recognition system as set forth in claim 5, wherein one mask vector is uniquely associated with each of said reference vectors such that each said reference template comprises a plurality of mask vectors corresponding in number to said predetermined plurality of reference vectors arranged in said predetermined sequence.
  - 7. A word recognition system as set forth in claim 1, wherein said signal processing means comprisessignal conditioning means for receiving the analog input speech signal and performing filtering and signal processing operations thereon to place the input speech signal in a format compatible with the determination of the feature aspects thereof, andmeans operably coupled to the output of said signal conditioning means for extracting feature vectors from the conditioned input speech signal in providing said sequence of feature vectors;
    - said comparing means being operably associated with said feature vector extracting means of said signal processing means in comparing each said feature vector of said input speech signal with the corresponding reference vectors of each of said reference templates.
  - 8. A word recognition system as set forth in claim 7, wherein said signal conditioning means is effective in performing filtering and signal processing operations on the analog input speech signal to produce a waveform sequence alternating between plus and minus polarity signs, said signal conditioning means further including a zero-crossing detector for counting the number of polarity transitions in the waveform sequence to obtain a zero-crossing count for each frame of the waveform sequence;
    - andsaid feature vector-extracting means providing said sequence of feature vectors from said conditioned input speech signal at predetermined speech frame intervals based upon the time duration intervals between zero-crossings of the waveform sequence.
  - 16. A word recognition system as set forth in claim 1, wherein each of said feature vectors, said reference vectors, and said mask vectors are binary vectors, one mask vector being uniquely associated with each of said reference vectors such that each said reference template is defined by a predetermined plurality of reference vectors and a plurality of mask vectors corresponding in number to said predetermined plurality of reference vectors;
    - each of said reference vectors and said mask vectors comprising respective pluralities of bits, and each said mask vector having the same number of bits as the reference vector associated therewith, each of said reference templates representative of a word thereby comprising a sequence of pairs of binary vectors including a reference vector and a mask vector in each said pair of binary vectors;
      
      each of said bits of each said mask vector selectively assuming alternatively predetermined values respectively indicating that a corresponding bit of said binary reference vector included in the pari of binary vectors therewith is either significant or insignificant; and
      
      said distance measure corresponding to a Hamming distance measure between each of said feature vectors and a respective reference vector as modified in accordance with said mask vector included in said pair of binary vectors therewith such that said Hamming distance measure is determined only with respect to the significant bits of said reference vectors in establishing the indentity of the word represented by the feature vectors as determined by the reference template as defined by the sequence of pairs of binary vectors which is the closest match to said feature vectors.

9. A word recognition system for identifying a spoken word represented by an analog speech signal, said word recognition system comprising:
- signal conditioning means for receiving an analog input speech signal and performing filtering and signal processing operations thereon to place the input speech signal in a format compatible with the determination of feature aspects thereof;
  
  means operably coupled to the output of said signal conditioning means for extracting feature vectors from said conditioned input speech signal to provide a sequence of feature vectors at predetermined speech frame intervals;
  
  memory means storing a plurality of reference templates of digital speech data respectively representative of individual words and comprising the vocabulary of the word recognition system, each of said reference templates being defined by a predetermined plurality of reference vectors arranged in a predetermined sequence and comprising an acoustic description of an individual word in a time-ordered sequence, each of said reference templates being further defined by a plurality of mask vectors corresponding in number to said predetermined plurality of reference vectors and respectively associated with a corresponding reference vector of said plurality of reference vectors, each said mask vector being indicative of the significance of the reference vector associated therewith in establishing the identity of the word represented by the reference template in which the reference vector occurs;
  
  means operably associated with said feature vector extracting means for comparing each feature vector of said input speech signal with the corresponding reference vectors of each of said reference templates to provide a distance measure with respect to each of said feature vectors and the predetermined reference vector sequences defining acoustic descriptions of the respective words included in the vocabulary of the word recognition system, said comparing means being responsive to the status of the respective mask vectors associated with the reference vectors to ignore elements of each said reference vector which are indicated by the associated mask vector corresponding thereto to be insignificant so as to provide said distance measure based upon significant elements of the reference vectors as included in the predetermined reference vector sequences; and
  
  word recognition means for determining which one of the plurality of the reference templates is the closest match to said input speech signal based upon the distance measures between each of said reference vector sequences and successively received feature vectors corresponding to respective speech frames.

10. A method for recognizing speech comprising:
- receiving an analog input speech signal;
  
  processing said analog input speech signal to provide a sequence of feature vectors from said input speech signal at predetermined speech frame intervals;
  
  associating at least one mask vector with each sequence of a plurality of reference vectors which have been organized in sequence with each of said reference vector sequences corresponding to a word which can be recognized, with said mask vector being indicative of the significance of portions of the reference vector sequence with which it is associated in establishing the identity of the word to which the respective reference vector sequence corresponds;
  
  comparing each of said feature vectors with each of said plurality of reference vectors in relation to the status of the respective mask vector associated with each said reference vector sequence;
  
  determining a distance measure with respect to each of said reference vectors for each successive feature vector in said sequence of feature vectors in response to the comparison therebetween wherein portions of each said reference vector sequence indicated by the associated at least one mask vector corresponding thereto to be insignificant are ignored such that said distance measure is based upon significant portions of the reference vector sequence; and
  
  recognizing words in accordance with the distance measures between each of said reference vector sequences and successively received feature vectors corresponding to respective speech frames.
- View Dependent Claims (11, 12, 13)
- - 11. A method as set forth in claim 10, wherein said feature vectors, said reference vectors and said mask vectors are binary, and wherein said distance measure-determining step comprises a Hamming distance measurement between said feature vector and a corresponding reference vector as modified in accordance with a respective mask vector associated therewith.
  - 12. A method as set forth in claim 11, wherein a plurality of mask vectors are respectively associated with said plurality of reference vectors arrangd in each said reference vector sequence such that one mask vector is uniquely associated with a corresponding one of said reference vectors included in each said reference vector sequence.
  - 13. A method as set forth in claim 12, wherein the comparison of each of said feature vectors with each of said plurality of reference vectors organized in reference vector sequences is responsive to the status of the respective mask vectors associated with the reference vectors so as to ignore elements of each said reference vector indicated by the associated mask vector corresponding thereto to be insignificant.

14. A method for recognizing speech comprising:
- receiving an analog input speech signal;
  
  conditioning said analog speech signal to produce a sequence of rectangular waveforms of polarity signs alternating between plus and minus polarities as a digital waveform signal;
  
  counting the number of polarity transitions in the digital waveform signal to obtain a zero-crossing count for each frame of the digital waveform signal;
  
  measuring the time duration intervals between zero-crossings of the digital waveform signal;
  
  providing a sequence of binary feature vectors based upon the measurements of the time duration intervals between zero-crossings of the digital waveform signal and corresponding to respective frames of the digital waveform signal;
  
  associating at least one mask vector with each sequence of a plurality of reference vectors which have been organized in sequences with each of said reference vector sequences corresponding to a word which can be recognized, wherein said at least one mask vector is indicative of the significance of portions of the reference vector sequence with which it is associated in establishing the identity of the word to which the respective reference vector sequence corresponds;
  
  comparing each of said feature vectors with each of said plurality of reference vectors organized in sequences and said at least one mask vector associated therewith;
  
  determining a distance measure with respect to each of said reference vectors for each successive feature vector in said sequence of said feature vectors in response to the comparison therebetween, wherein portions of each said reference vector sequence indicated by the associated at least one mask vector corresponding thereto as being insignificant are ignored such that said distance measure is based upon significant portions of the respective reference vector sequence; and
  
  recognizing words in accordance with the distance measures between each of said reference vector sequences and successively received feature vectors corresponding to respective speech frames.
- View Dependent Claims (15)
- - 15. A method as set forth in claim 14, wherein a plurality of mask vectors are respectively associated with said plurality of reference vectors arranged in each said reference vector sequence such that one mask vector is uniquely associated with a corresponding one of said reference vectors included in each said reference vector sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Schalk, Thomas B., Rajasekaran, Periagaram K., Doddington, George R.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/484,730
Time in Patent Office

1,700 Days
Field of Search

381/42, 381/43, 381/45, 364/513.5, 364/900
US Class Current

704/253
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

G10L 25/00 Speech or voice analysis te...

Speaker-independent word recognizer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Speaker-independent word recognizer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others