Apparatus and method for speech recognition in the presence of unnatural speech effects
First Claim
1. A speech recognition apparatus comprising:
- an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal;
a normal speech model memory for storing a normal speech model learned based on normal speech data;
an acoustic-phonetic variability model memory for storing a plurality of acoustic-phonetic variability models, each representing an acoustic-phonetic change of spectrum caused by unnatural speech; and
speech recognition means for generating an unnatural speech model based on said normal speech model and at least one of said plurality of acoustic-phonetic variability models corresponding to another of said normal speech models, for recognizing said input speech signal of unnatural speech based on said time-series feature vector and said unnatural speech model, and for outputting a recognition result.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition apparatus recognizes utterances of unnatural speech having a higher performance of recognition accuracy with a smaller amount of speech learning data. The speech recognition apparatus includes an acoustic-phonetic variability learn unit, a normal speech model memory, a spectrum smooth-modifier and a speech recognizer. An input speech signal is acoustically analyzed and transformed into a time-series feature vector. The acoustic-phonetic variability learn unit learns an acoustic-phonetic change of spectrum caused by unnatural speech and generates a plurality of acoustic-phonetic variability models. The normal speech model memory stores a normal speech model learned based on normal speech data. The spectrum smooth-modifier modifies the normal speech model based on a plurality of the acoustic-phonetic variability model and generates a plurality of spectrum-modified speech models. The speech recognizer recognizes the time-series feature vector based on the normal speech model and the spectrum-modified speech model.
-
Citations
40 Claims
-
1. A speech recognition apparatus comprising:
-
an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal; a normal speech model memory for storing a normal speech model learned based on normal speech data; an acoustic-phonetic variability model memory for storing a plurality of acoustic-phonetic variability models, each representing an acoustic-phonetic change of spectrum caused by unnatural speech; and
speech recognition means for generating an unnatural speech model based on said normal speech model and at least one of said plurality of acoustic-phonetic variability models corresponding to another of said normal speech models, for recognizing said input speech signal of unnatural speech based on said time-series feature vector and said unnatural speech model, and for outputting a recognition result. - View Dependent Claims (2, 3, 4, 5, 9, 10, 11, 12, 13, 14)
-
-
6. A speech recognition apparatus comprising:
-
an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal; a normal speech model memory for storing a normal speech model learned based on normal speech data; an acoustic-phonetic variability model memory for storing an acoustic-phonetic variability model representing an acoustic-phonetic change of spectrum caused by said unnatural speech; and speech learning means for learning said acoustic-phonetic change with said time-series feature vector based on said normal speech model and for generating said acoustic-phonetic variability model. - View Dependent Claims (7, 8)
-
-
15. A speech recognition apparatus, comprising:
-
an acoustic analyzer for analyzing an input speech signal of unnatural speech and extracting a time-series feature vector from said input speech signal; a normal speech model memory for storing a normal speech model learned based on normal speech data; speech learning means for learning an acoustic-phonetic change based on said time-series feature vector and said normal speech model and for generating an acoustic-phonetic variability model based upon said acoustic-phonetic change; duration change learning means for learning a duration change by unnatural speech on a phonological unit basis based on said acoustic-phonetic variability model and said normal speech model, and for generating duration change data based upon said duration change; and a duration memory for storing said duration change data. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A speech recognition method, comprising the steps of:
-
storing a plurality of normal speech models, a plurality of acoustic-phonetic variability models corresponding to some of said plurality of normal speech models, and a plurality of values of weight coefficients, each representing a similarity of one of said plurality of normal speech models to another of said plurality of normal speech models; selecting a selected plurality of said acoustic-phonetic variability models having highest values of said weight coefficient with one of said normal speech models, modifying a spectrum of said one of normal speech models based on each one of said selected plurality of acoustic-phonetic variability models, and generating a plurality of modified spectra of said one of said normal speech models; calculating a mean value of said plurality of modified spectra to generate a modified normal speech model based on said mean value; and comparing said mean value modified normal speech model with input unnatural speech data and outputting a comparison result. - View Dependent Claims (21, 22)
-
-
23. A speech recognition method, comprising the steps of:
-
analyzing an input speech signal to extract a time-series feature vector from said input speech signal; learning normal speech data, generating a normal speech model including a duration parameter, and storing said normal speech model; learning an acoustic-phonetic variability model representing an acoustic-phonetic change of spectrum caused by unnatural speech based on said normal speech model and said time-series feature vector; calculating a duration change by unnatural speech on a phonological unit basis based on said normal speech model and said acoustic-phonetic variability model, and storing duration change data; modifying said duration parameter of said normal speech model based on said duration change data, and generating a parameter-modified normal speech model; and recognizing said time-series feature vector based on said parameter-modified normal speech model and said acoustic-phonetic variability model, and outputting a recognition result.
-
-
24. A speech recognition apparatus for recognizing an input utterance having an acoustic-phonetic change of spectrum caused by unnatural speech, said speech recognition apparatus comprising:
-
an acoustic analyzer for extracting a feature vector from said input utterance; a normal speech data memory for providing a learning result of normal speech data; a memory for providing a learning result of said acoustic-phonetic change; and a speech recognition unit having an input that receives said feature vector and an output that provides a recognition result, said speech recognition unit comprising; means for modifying said learning result off normal speech data based on at least one of said learning results of acoustic-phonetic change and for generating a modified speech model based on said acoustic-phonetic change, said at least one of said learning results of acoustic-phonetic change may not correspond to said learning result of normal speech model. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A speech recognition apparatus comprising:
-
an acoustic analyzer for extracting a feature vector from an input utterance; a normal speech data memory for providing a learning result of normal speech data; and a speech learning unit including; means for learning an acoustic-phonetic change of spectrum caused by unnatural speech to generate a learning result; and means for modifying said learning result of normal speech data based on at least one of said learning result of said acoustic-phonetic change to generate a reference speech model. - View Dependent Claims (35, 36, 37, 38, 39, 40)
-
Specification