Apparatus and method for speech recognition in the presence of unnatural speech effects

US 5,742,928 A
Filed: 10/26/1995
Issued: 04/21/1998
Est. Priority Date: 10/28/1994
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition apparatus comprising:

an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal;

a normal speech model memory for storing a normal speech model learned based on normal speech data;

an acoustic-phonetic variability model memory for storing a plurality of acoustic-phonetic variability models, each representing an acoustic-phonetic change of spectrum caused by unnatural speech; and

speech recognition means for generating an unnatural speech model based on said normal speech model and at least one of said plurality of acoustic-phonetic variability models corresponding to another of said normal speech models, for recognizing said input speech signal of unnatural speech based on said time-series feature vector and said unnatural speech model, and for outputting a recognition result.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition apparatus recognizes utterances of unnatural speech having a higher performance of recognition accuracy with a smaller amount of speech learning data. The speech recognition apparatus includes an acoustic-phonetic variability learn unit, a normal speech model memory, a spectrum smooth-modifier and a speech recognizer. An input speech signal is acoustically analyzed and transformed into a time-series feature vector. The acoustic-phonetic variability learn unit learns an acoustic-phonetic change of spectrum caused by unnatural speech and generates a plurality of acoustic-phonetic variability models. The normal speech model memory stores a normal speech model learned based on normal speech data. The spectrum smooth-modifier modifies the normal speech model based on a plurality of the acoustic-phonetic variability model and generates a plurality of spectrum-modified speech models. The speech recognizer recognizes the time-series feature vector based on the normal speech model and the spectrum-modified speech model.

Citations

40 Claims

1. A speech recognition apparatus comprising:
- an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal;
  
  a normal speech model memory for storing a normal speech model learned based on normal speech data;
  
  an acoustic-phonetic variability model memory for storing a plurality of acoustic-phonetic variability models, each representing an acoustic-phonetic change of spectrum caused by unnatural speech; and
  
  speech recognition means for generating an unnatural speech model based on said normal speech model and at least one of said plurality of acoustic-phonetic variability models corresponding to another of said normal speech models, for recognizing said input speech signal of unnatural speech based on said time-series feature vector and said unnatural speech model, and for outputting a recognition result.
- View Dependent Claims (2, 3, 4, 5, 9, 10, 11, 12, 13, 14)
- - 2. The speech recognition apparatus of claim 1, wherein said speech recognition means generates said unnatural speech model based on said normal speech model and at least two of said plurality of acoustic-phonetic variability models.
  - 3. The speech recognition apparatus of claim 2, wherein said speech recognition means comprises:
    - spectrum modifying means for modifying a spectrum of said normal speech model based on at least two of said plurality of acoustic-phonetic variability models and for generating a spectrum-modified speech model of said unnatural speech model;
      
      a speech model synthesizer for synthesizing said spectrum-modified speech model and said normal speech model and for generating a synthesized speech model; and
      
      speech identifying means for calculating a similarity of said time-series feature vector based on said synthesized speech model and for outputting said recognition result based on said similarity.
  - 4. The speech recognition apparatus of claim 3 further comprising:
    - speech learning means for learning said acoustic-phonetic change based on said time-series feature vector and said normal speech model and for generating said acoustic-phonetic variability model.
  - 5. The speech recognition apparatus of claim 4, wherein said speech recognition apparatus processes Lombard speech including any forms of speech uttered in an unnatural environment and speech by the disabled.
  - 9. The speech recognition apparatus of claim 3 or 7, further comprising:
    - weight memory means for calculating a weight coefficient between one of said normal speech models and another of said normal speech models based upon spectral similarity, and for storing said weight coefficient;
      
      wherein said spectrum modifying means includes means for selecting a plurality of said normal speech models having highest values of said weight coefficient with said spectrum, and means for modifying said spectrum based on said corresponding models of said acoustic-phonetic variability models.
  - 10. The speech recognition apparatus of claim 9, wherein said weight memory means includes means for assigning a highest value of said weight coefficient to said one of said normal speech models when said one of said normal speech models and said another of said normal speech models are same models in said weight memory means.
  - 11. The speech recognition apparatus of claim 10, wherein said spectrum modifying means comprises:
    - a mean-value calculator for modifying said spectrum based on each of a plurality of said acoustic-phonetic variability models corresponding to said selected normal speech models, and for calculating a mean value of a plurality of said modified spectra of said normal speech model.
  - 12. The speech recognition apparatus of claim 11, wherein said mean-value calculator calculates said mean value based on said weight coefficient.
  - 13. The speech recognition apparatus of claim 3 or 7, wherein said spectrum modifying means includes means for modifying a spectrum of a learned one of said normal speech models based on a corresponding one of said acoustic-phonetic variability models to said learned one of said normal speech models and at least another of said acoustic-phonetic variability models.
  - 14. The speech recognition apparatus of claim 3 or 7, wherein said spectrum modifying means includes means for modifying a spectrum of an unlearned one of said normal speech models based on a plurality of said acoustic-phonetic variability models.

6. A speech recognition apparatus comprising:
- an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal;
  
  a normal speech model memory for storing a normal speech model learned based on normal speech data;
  
  an acoustic-phonetic variability model memory for storing an acoustic-phonetic variability model representing an acoustic-phonetic change of spectrum caused by said unnatural speech; and
  
  speech learning means for learning said acoustic-phonetic change with said time-series feature vector based on said normal speech model and for generating said acoustic-phonetic variability model.
- View Dependent Claims (7, 8)
- - 7. The speech recognition apparatus of claim 6, wherein said speech learning means further comprises:
    - a reference speech model buffer for buffering a reference speech model;
      
      means for segmenting said time-series feature vector based on said reference speech model to generate segment data;
      
      a parameter calculator for calculating a parametric representation of said acoustic-phonetic variability model of each segment data of said time-series feature vector based on said reference speech model and said normal speech model;
      
      an acoustic-phonetic variability model buffer for buffering said acoustic-phonetic variability model; and
      
      spectrum modifying means for modifying a spectrum of said normal speech model based on a plurality of said parametric representations of said acoustic-phonetic variability models, for generating a spectrum-modified speech model and for outputting said spectrum-modified speech model to said reference speech model buffer.
  - 8. The speech recognition apparatus of claim 7, further comprising:
    - speech recognition means for generating an unnatural speech model based on said normal speech model and a plurality of said acoustic-phonetic variability models, for recognizing said input speech signal of unnatural speech based on said time-series feature vector and said unnatural speech model, and for outputting a recognition result.

15. A speech recognition apparatus, comprising:
- an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal;
  
  a normal speech model memory for storing a normal speech model learned based on normal speech data;
  
  speech learning means for learning an acoustic-phonetic change based on said time-series feature vector and said normal speech model and for generating an acoustic-phonetic variability model based upon said acoustic-phonetic change;
  
  duration change learning means for learning a duration change by unnatural speech on a phonological unit basis based on said acoustic-phonetic variability model and said normal speech model, and for generating duration change data based upon said duration change; and
  
  a duration memory for storing said duration change data.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The speech recognition apparatus of claim 15, further comprising:
    - an acoustic-phonetic variability model memory for storing a plurality of acoustic-phonetic variability models, each representing an acoustic-phonetic change of spectrum caused by unnatural speech;
      
      a duration parameter modifier for modifying a parametric representation of normal duration of said normal speech model stored in said normal speech model memory based on said duration change data to generate a parameter-modified normal speech model; and
      
      speech recognition means for generating an unnatural speech model based on said normal speech model and at least one of said plurality of acoustic-phonetic variability models corresponding to another of said normal speech models, for recognizing said input speech signal of unnatural speech based on said time-series feature vector and said unnatural speech model, and for outputting a recognition result.
  - 17. The speech recognition apparatus of claim 15, wherein said speech recognition means generates said unnatural speech model based on said normal speech model and at least two of said plurality of acoustic-phonetic variability models.
  - 18. The speech recognition apparatus of claim 15, wherein said duration change learning means includes means for learning a duration change observed from a vowel.
  - 19. The speech recognition apparatus of claim 15, wherein said duration change learning means includes means for learning said duration change observed with five vowels of /a/, /e/, /i/, /o/ and /u/ and for calculating a mean value of said duration changes obtained from at least two of said five vowels.

20. A speech recognition method, comprising the steps of:
- storing a plurality of normal speech models, a plurality of acoustic-phonetic variability models corresponding to some of said plurality of normal speech models, and a plurality of values of weight coefficients, each representing a similarity of one of said plurality of normal speech models to another of said plurality of normal speech models;
  
  selecting a selected plurality of said acoustic-phonetic variability models having highest values of said weight coefficient with one of said normal speech models, modifying a spectrum of said one of normal speech models based on each one of said selected plurality of acoustic-phonetic variability models, and generating a plurality of modified spectra of said one of said normal speech models;
  
  calculating a mean value of said plurality of modified spectra to generate a modified normal speech model based on said mean value; and
  
  comparing said mean value modified normal speech model with input unnatural speech data and outputting a comparison result.
- View Dependent Claims (21, 22)
- - 21. The speech recognition method of claim 20, further comprising a step of recognizing unnatural speech based on said comparison result.
  - 22. The speech recognition method of claim 20, further comprising a step of learning based on said comparison result.

23. A speech recognition method, comprising the steps of:
- analyzing an input speech signal to extract a time-series feature vector from said input speech signal;
  
  learning normal speech data, generating a normal speech model including a duration parameter, and storing said normal speech model;
  
  learning an acoustic-phonetic variability model representing an acoustic-phonetic change of spectrum caused by unnatural speech based on said normal speech model and said time-series feature vector;
  
  calculating a duration change by unnatural speech on a phonological unit basis based on said normal speech model and said acoustic-phonetic variability model, and storing duration change data;
  
  modifying said duration parameter of said normal speech model based on said duration change data, and generating a parameter-modified normal speech model; and
  
  recognizing said time-series feature vector based on said parameter-modified normal speech model and said acoustic-phonetic variability model, and outputting a recognition result.

24. A speech recognition apparatus for recognizing an input utterance having an acoustic-phonetic change of spectrum caused by unnatural speech, said speech recognition apparatus comprising:
- an acoustic analyzer for extracting a feature vector from said input utterance;
  
  a normal speech data memory for providing a learning result of normal speech data;
  
  a memory for providing a learning result of said acoustic-phonetic change; and
  
  a speech recognition unit having an input that receives said feature vector and an output that provides a recognition result, said speech recognition unit comprising;
  
  means for modifying said learning result off normal speech data based on at least one of said learning results of acoustic-phonetic change and for generating a modified speech model based on said acoustic-phonetic change, said at least one of said learning results of acoustic-phonetic change may not correspond to said learning result of normal speech model.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 25. The speech recognition apparatus of claim 24, wherein:
    - said learning result of normal speech data is a normal speech model on a phonological unit basis; and
      
      said learning result of said acoustic-phonetic change is an acoustic-phonetic variability model on said phonological unit basis.
  - 26. The speech recognition apparatus of claim 25, wherein said speech recognition unit further comprises:
    - means for generating a reference speech model based on said modified speech model; and
      
      a recognizer for recognizing said input utterance based on said feature vector and said reference speech model.
  - 27. The speech recognition apparatus of claim 26, wherein said means for generating the reference speech model includes a synthesizer for synthesizing said modified speech model and said normal speech model to generate a synthesized speech model as said reference speech model.
  - 28. The speech recognition apparatus of claim 26, wherein said recognizer comprises:
    - means for calculating decision data based on said feature vector and said reference speech model; and
      
      an identifier for identifying said input utterance based on said decision data.
  - 29. The speech recognition apparatus of claim 28, wherein said means for calculating includes means for calculating a similarity between said feature vector and said reference speech model to generate similarity data as said decision data.
  - 30. The speech recognition apparatus of claim 25, wherein said normal speech data memory has an output that provides a recognition vocabulary in a string of said normal speech models for recognition.
  - 31. The speech recognition apparatus of claim 30, wherein said identifier has an output that identifies said input utterance based on comparison of said decision data to said recognition vocabulary.
  - 32. The speech recognition apparatus of claim 31, further comprising:
    - a transfer switch including a learn mode and a recognize mode, for transferring said feature vector to said speech learning unit when said learn mode switch is in the learn mode, and for transferring said feature vector to said speech learning unit to said speech recognition unit when said learn mode switch is in the recognize mode.
  - 33. The speech recognition apparatus of claim 25, wherein each acoustic-phonetic variability model includes parameters representing changes in a spectral envelope of input speech caused by unnatural speech.

34. A speech recognition apparatus comprising:
- an acoustic analyzer for extracting a feature vector from an input utterance;
  
  a normal speech data memory for providing a learning result of normal speech data; and
  
  a speech learning unit including;
  
  means for learning an acoustic-phonetic change of spectrum caused by unnatural speech to generate a learning result; and
  
  means for modifying said learning result of normal speech data based on at least one of said learning result of said acoustic-phonetic change to generate a reference speech model.
- View Dependent Claims (35, 36, 37, 38, 39, 40)
- - 35. The speech recognition apparatus of claim 34, wherein said learning result of normal speech data is a normal speech model on a phonological unit basis and said learning result of said acoustic-phonetic change is an acoustic-phonetic variability model on said phonological unit basis.
  - 36. The speech recognition apparatus of claim 35, wherein said speech learning unit further comprises:
    - means for calculating a parameter of spectral change of said feature vector based on said reference speech model and said normal speech model and for generating a parametric representation of said acoustic-phonetic variability model.
  - 37. The speech recognition apparatus of claim 36, wherein said speech learning unit further comprises:
    - a first buffer for buffering and outputting said acoustic-phonetic variability model to said modifying means during an execution of a learning loop performed by said means for learning and for buffering and outputting said acoustic-phonetic variability model output from said speech learning unit at an end of said learning loop.
  - 38. The speech recognition apparatus of claim 36, wherein said speech learning unit further comprises:
    - means for segmenting said feature vector based on said reference speech model, for generating segmented data and for providing said segmented data for said parametric calculation.
  - 39. The speech recognition apparatus of claim 38, wherein said speech learning unit further comprises:
    - a second buffer for buffering and outputting said reference speech model to said segmenting means for generating said segmented data of said feature vector and to said calculating means for said parametric calculation.
  - 40. The speech recognition apparatus of claim 38,wherein said normal speech data memory further provides normal duration data learned and stored based on normal speech data;
    - andwherein said segmenting means outputs said segment data during an execution of a learning loop performed by said means for learning;
      
      said speech recognition apparatus further comprising;
      
      a duration change learn unit for learning a duration change of said input utterance based on said segment data and said normal duration data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mitsubishi Denki Kabushiki Kaisha (Mitsubishi Electric Corporation)
Original Assignee
Mitsubishi Denki Kabushiki Kaisha (Mitsubishi Electric Corporation)
Inventors
Suzuki, Tadashi
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
COLLINS, ALPHONSO

Application Number

US08/548,702
Time in Patent Office

908 Days
Field of Search

395/2.34, 395/2.35, 395/2.42, 395/2.45, 395/2.51, 395/2.59, 395/2.65, 395/2.47-2.5, 395/2.53
US Class Current

704/239
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/20   Speech recognition techniqu...

G10L 2021/03646   Stress or Lombard effect

Apparatus and method for speech recognition in the presence of unnatural speech effects

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for speech recognition in the presence of unnatural speech effects

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links