Speech recognition and synthesis systems which distinguish speech phonemes from noise

US 5,966,690 A
Filed: 06/07/1996
Issued: 10/12/1999
Est. Priority Date: 06/09/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system for recognizing speech in an environment of noise and disturbing sound, comprising:

means for receiving a sound signal, the sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound;

an analysis means for analyzing the sound signal received by the means for receiving; and

a recognition means for recognizing a signal representative of said phonemes and time series of phonemes;

wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and

wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The language is composed using phonemes which is easy to separate in an environment of noise and disturbing sound.

An acoustic signal comprising phonemes which is easy to separate in an environment of noise and disturbing sound fed to the acoustic signal analyzer is analyzed therein, and from the result of the analysis, the tone name is recognized in the tone name identifier, and after the tone name is recognized as a prescribed sentence, the sentence is fed to the utterance generator. Corresponding to these operations, the utterance generator generates a prescribed time series composed of tone names comprising phonemes which is easy to separate in an environment of noise and disturbing sound, and inputs it to the acoustic signal generator. The acoustic signal generator synthesizes an acoustic signal which corresponds to the time series and outputs the acoustic signal.

Citations

26 Claims

1. A speech recognition system for recognizing speech in an environment of noise and disturbing sound, comprising:
- means for receiving a sound signal, the sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound;
  
  an analysis means for analyzing the sound signal received by the means for receiving; and
  
  a recognition means for recognizing a signal representative of said phonemes and time series of phonemes;
  
  wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and
  
  wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech recognition system as claimed in claim 1, wherein analysis results by said analysis means are arranged in the prescribed order individually in the analysis space having a prescribed dimension, and the position of the feature points of said analysis results moves according to a prescribed rule in the time series analysis space composed of said analysis space and the time axis.
  - 3. The speech recognition system as claimed in claim 2, wherein said feature point is a point selected from the maximum value and minimum value of said analysis result in said analysis space.
  - 4. The speech recognition system as claimed in claim 2, wherein said motion of the position of said feature point in said time series analysis space is a motion in the direction inclined with a prescribed angle to said time axis for a prescribed time or longer.
  - 5. The speech recognition system as claimed in claim 2, wherein said motion of the position of said feature point in said time series analysis space is a motion in the direction parallel to said time axis for a prescribed time or linger.
  - 6. The speech recognition system as claimed in claim 1, wherein the pitch of said sound does not change exceeding a prescribed time beyond a prescribed range.
  - 7. The speech recognition system as claimed in claim 1, wherein said phoneme is a vowel.
  - 8. The speech recognition system as claimed in claim 1, wherein said analysis means executes linear convolution operation.
  - 9. The speech recognition system as claimed in claim 1, wherein said system is provided with the recognition means including Hidden Markov Model, and said recognition means recognizes the sound which corresponds to said analysis result by said analysis means.

10. A speech synthesis system for synthesizing speech in an environment of noise and disturbing sound, comprising:
- a synthesis means for synthesizing speech;
  
  an analysis means coupled to said synthesis means for analyzing a sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound; and
  
  a recognition means for recognizing a signal representative of said phonemes and time series of phonemes;
  
  wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and
  
  wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The speech synthesis system as claimed in claim 10, wherein analysis results of said sounds by prescribed analysis method are arranged in the prescribed order individually in the analysis space having a prescribed dimension, and the position of the feature points of said analysis results moves according to a prescribed rule in the time series analysis space composed of said analysis space and the time axis.
  - 12. The speech synthesis system as claimed in claim 11, wherein said feature point is a point selected from the maximum value and minimum value of said analysis result in said analysis space.
  - 13. The speech synthesis system as claimed in claim 11, wherein said motion of the position of said feature point in said time series analysis space is a motion in the direction inclined with a prescribed angle to said time axis for a prescribed time or longer.
  - 14. The speech synthesis system as claimed in claim 11, wherein said motion of the position of said feature point in said time series analysis space is a motion in the direction parallel to said time axis for a prescribed time or linger.
  - 15. The speech synthesis system as claimed in claim 10, wherein the pitch of said sound does not change exceeding a prescribed time beyond a prescribed range.
  - 16. The speech synthesis system as claimed in claim 10, wherein said phoneme is a vowel.

17. A speech recognition and synthesis system for recognizing and synthesizing speech in an environment of noise and disturbing sound, which system is provided with an analysis means for analyzing a sound signal including at least one of phonemes and time series of phonemes, and the sound signal further including at least one of noise and disturbing sound, and the system also being provided with a recognition means for recognizing a signal representative of said phonemes and time series of phonemes, wherein the analysis means analyzes the sound signal by performing a frequency analysis and the result of the frequency analysis is treated as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period, and wherein the analysis means separates said noise and disturbing sound from said phonemes and time series of phonemes based upon said analyzing and outputs a signal representative of said phonemes and time series of phonemes to the recognition means, and the system also being provided with a synthesis means coupled to said recognition means for synthesizing speech.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. The speech recognition synthesizing system as claimed in claim 17, wherein analysis results by said analysis means are arranged in the prescribed order individually in the analysis space having a prescribed dimension, and the position of the feature points of said analysis results moves according to a prescribed rule in the time series analysis space composed of said analysis space and the time axis.
  - 19. The speech recognition synthesizing system as claimed in claim 18, wherein said feature point is a point selected from the maximum value and minimum value of said analysis result in said analysis space.
  - 20. The speech recognition synthesizing system as claimed in claim 18, wherein said motion of the position of said feature point in said time series analysis space is a motion in the direction inclined with a prescribed angle to said time axis for a prescribed time or longer.
  - 21. The speech recognition synthesizing system as claimed in claim 18, wherein said motion of the position of said feature point in said time series analysis space is a motion in the direction parallel to said time axis for a prescribed time or linger.
  - 22. The speech recognition synthesizing system as claimed in claim 17, wherein the pitch of said sound does not change exceeding a prescribed time beyond a prescribed range.
  - 23. The speech recognition synthesizing system as claimed in claim 17, wherein said phoneme is a vowel.
  - 24. The speech recognition synthesizing system as claimed in claim 17, wherein said analysis means executes linear convolution operation.
  - 25. The speech recognition synthesizing system as claimed in claim 17, wherein said system is additionally provided with a recognition means including Hidden Markov Model, and said recognition means recognizes the sound which corresponds to said analysis result by said analysis means.

26. A method for separation of speech from noise and other disturbing sound, comprising the steps of:
- receiving a sound signal including phonemes and further including at least one of the following, noise and other disturbing sound;
  
  converting the received sound signal to a digital signal, wherein some portions of the digital signal correspond to phonemes and other portions correspond to noise and other disturbing sound; and
  
  analyzing the digital signal to distinguish portions of the digital signal which correspond to phonemes from portions of the digital signal which correspond to noise and other disturbing sound;
  
  wherein the step of analyzing includes analyzing the sound signal by performing a frequency analysis and treating the result of the frequency analysis as a time series in determining if a feature point of the sound signal moves in a prescribed direction in a frequency analysis space relative to a time index axis for a prescribed time period; and
  
  based upon said analyzing outputting a signal representative of said phonemes; and
  
  recognizing said signal representative of said phonemes as speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Inoe, Makoto, Fujita, Masahiro, Akabane, Makoto, Kageyama, Koji
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US08/661,396
Time in Patent Office

1,222 Days
Field of Search

395/2.4, 395/2.46, 395/2.55, 395/2.6, 395/2.65, 395/2.66, 395/2.67, 395/2.71, 395/2.77, 395/2.72, 704/231, 704/233, 704/208, 704/268
US Class Current

704/233
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 15/02   Feature extraction for spee...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/15   the extracted parameters be...

Speech recognition and synthesis systems which distinguish speech phonemes from noise

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition and synthesis systems which distinguish speech phonemes from noise

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links