Speech recognition and synthesis systems which distinguish speech phonemes from noise
First Claim
1. A speech recognition system for recognizing speech in an environment of noise and disturbing sound, comprising:
- means for receiving a sound signal, the sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound;
an analysis means for analyzing the sound signal received by the means for receiving; and
a recognition means for recognizing a signal representative of said phonemes and time series of phonemes;
wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and
wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means.
1 Assignment
0 Petitions
Accused Products
Abstract
The language is composed using phonemes which is easy to separate in an environment of noise and disturbing sound.
An acoustic signal comprising phonemes which is easy to separate in an environment of noise and disturbing sound fed to the acoustic signal analyzer is analyzed therein, and from the result of the analysis, the tone name is recognized in the tone name identifier, and after the tone name is recognized as a prescribed sentence, the sentence is fed to the utterance generator. Corresponding to these operations, the utterance generator generates a prescribed time series composed of tone names comprising phonemes which is easy to separate in an environment of noise and disturbing sound, and inputs it to the acoustic signal generator. The acoustic signal generator synthesizes an acoustic signal which corresponds to the time series and outputs the acoustic signal.
-
Citations
26 Claims
-
1. A speech recognition system for recognizing speech in an environment of noise and disturbing sound, comprising:
-
means for receiving a sound signal, the sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound; an analysis means for analyzing the sound signal received by the means for receiving; and a recognition means for recognizing a signal representative of said phonemes and time series of phonemes; wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A speech synthesis system for synthesizing speech in an environment of noise and disturbing sound, comprising:
-
a synthesis means for synthesizing speech; an analysis means coupled to said synthesis means for analyzing a sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound; and a recognition means for recognizing a signal representative of said phonemes and time series of phonemes; wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
- 17. A speech recognition and synthesis system for recognizing and synthesizing speech in an environment of noise and disturbing sound, which system is provided with an analysis means for analyzing a sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound, and the system also being provided with a recognition means for recognizing a signal representative of said phonemes and time series of phonemes, wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period, and wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means, and the system also being provided with a synthesis means coupled to said recognition means for synthesizing speech.
-
26. A method for separation of speech from noise and other disturbing sound, comprising the steps of:
-
receiving a sound signal including phonemes and further including at least one of the following, noise and other disturbing sound; converting the received sound signal to a digital signal, wherein some portions of the digital signal correspond to phonemes and other portions correspond to noise and other disturbing sound; and analyzing the digital signal to distinguish portions of the digital signal which correspond to phonemes from portions of the digital signal which correspond to noise and other disturbing sound; wherein the step of analyzing includes analyzing the sound signal by performing a frequency analysis and treating the result of the frequency analysis as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and based upon said analyzing outputting a signal representative of said phonemes; and recognizing said signal representative of said phonemes as speech.
-
Specification