Allophone vocoder
First Claim
1. A speech recognition system comprising:
- means for analyzing digital speech data representative of an analog speech signal to generate perceived phonemes representative of component parts of said digital speech data;
memory means having encoded digital speech data stored therein, said encoded digital speech data including phoneme codes representative of a plurality of respective reference phonemes, said memory means further having digital speech data stored therein representative of allophones analogous to said phoneme codes;
means operably coupled to said analyzing means and to said memory means for selecting encoded digital speech data representative of a particular reference phoneme from said memory means as the closest match for each of said perceived phonemes of said digital speech data to provide a phoneme code at least approximating each of said perceived phonemes; and
means operably coupled to said selecting means and said memory means for forming a phoneme code sequence of a plurality of said phoneme codes, said phoneme code sequence-formeing means being responsive to said phoneme codes as determined by said selecting means to access digital speech data from said memory means representative of analogous allophones corresponding to said phoneme codes.
1 Assignment
0 Petitions
Accused Products
Abstract
An allophone vocoder which utilizes the inherent redundancy of the spoken language together with the automatic human filtering of speech so as to obtain a speech compression and recognition system. An analog speech signal is broken up into its phoneme components and encoded for transmission. The encoded phoneme sequence has a much higher compression rate than the analog speech signal. The phonemes are then either transmitted, stored, or used to generate directly an analogous allophone sequence so as to approximate the original speech signal. Due to the inherent redundancy of the spoken language, and the filtering effect of the human ear, variations or errors in the approximations of the phonemes received from the original speech signal are inconsequential to the comprehension ability of the final allophone synthesized speech.
43 Citations
10 Claims
-
1. A speech recognition system comprising:
-
means for analyzing digital speech data representative of an analog speech signal to generate perceived phonemes representative of component parts of said digital speech data; memory means having encoded digital speech data stored therein, said encoded digital speech data including phoneme codes representative of a plurality of respective reference phonemes, said memory means further having digital speech data stored therein representative of allophones analogous to said phoneme codes; means operably coupled to said analyzing means and to said memory means for selecting encoded digital speech data representative of a particular reference phoneme from said memory means as the closest match for each of said perceived phonemes of said digital speech data to provide a phoneme code at least approximating each of said perceived phonemes; and means operably coupled to said selecting means and said memory means for forming a phoneme code sequence of a plurality of said phoneme codes, said phoneme code sequence-formeing means being responsive to said phoneme codes as determined by said selecting means to access digital speech data from said memory means representative of analogous allophones corresponding to said phoneme codes. - View Dependent Claims (2)
-
-
3. A speech recognition and systhesis system comprising:
-
means for analyzing digital speech data representative of an analog speech signal to generate perceived phonemes representative of component parts of said digital speech data; memory means having encoded digital speech data stored therein, said encoded digital speech data including phoneme codes representative of a plurality of respective reference phonemes, said memory means further having digital speech data stored therein representative of allophones analogous to said phoneme codes; means operably coupled to said analyzing means and to said memory means for selecting encoded digital speech data representative of a particular reference phoneme from said memory means as the closest match for each of said perceived phonemes of said digital speech data to provide a phoneme code at least approximating each of said perceived phonemes; means operably coupled to said selecting means an said memory means for forming a phoneme code sequence of a plurality of said phoneme codes, said phoneme code sequence-forming means being responsive to said phoneme codes as determined by said selecting means to access digital speech data from said memory means representative of analogous allophones corresponding to said phoneme codes; speech synthesizer means operably coupled to the output of said phoneme code sequence-forming means for processing the digital speech data representative of allophones provided thereby to generate an analog speech signal; and audio means coupled to said speech synthesizer means for converting said analog speech signal generated thereby into audible synthesized speech coresponding to the original analog speech signal. - View Dependent Claims (4, 5)
-
-
6. A vocoder comprising:
-
means for analyzing digital speech data representative of an analog speech signal and identifying phoneme components of said digital speech data; library means storing digital speech data including encoded digital speech data in the form of phoneme codes representative of a plurality of reference phonemes comprising all of the recognized phonemes in a given spoken language, each of which has an associated set of allophone characteristics corresponding thereto stored as digital speech data in said library means; comparator means operably coupled to said analyzing means and said library means for obtaining the closest match from said plurality of reference phonemes as represented by the encoded digital speech data stored in said library means to said phoneme components of said digital speech data to provide a phoneme code at least approximating each of said phoneme components of said digital speech data identified by said analyzing means; means for providing a phoneme code sequence of connected phoneme codes corresponding to the respective reference phomemes from said phoneme codes stored in said library means which are the closest match to said phoneme components of said digital speech data representative of said analog speech signal; said library means being responsive to said phoneme code sequence to provide a phoneme-to-allophone translation in communicating digital speech data representative of allphones to said phoneme code sequence-forming means; speech synthesizer means connected to the output of said phoneme code sequence-forming means for processing the digital speech data representative of allophones provided thereby to generate an analog speech signal; and audio means coupled to said speech synthesizer means for converting said analog speech signal generated thereby into audible synthesized speech corresponding to the original analog speech signal. - View Dependent Claims (7, 8)
-
-
9. A method of analyzing a speech signal and producing audible synthesized speech comprising:
-
providing an analog speech signal; identifying phoneme component parts of said analog speech signal; comparing each of the phoneme component parts as identified from said analog speech signal with a plurality of reference phonemes comprising all of the recognized phonemes in a given spoken language; obtaining the closest match from said plurality of reference phonemes to each of the identified phoneme component parts of said analog speech signal to provide respective phoneme codes at least approximating each of the identified phoneme component parts; forming a phoneme code sequence of connected phoneme codes as determined by the matching of the closest reference phoneme to each of the identified phoneme component parts of said analog speech signal; translating the formed phoneme code sequence into an analogous allophone sequence thereto; generating analog signals representative of synthesized speech from said allophone sequence; and producing audible synthesized speech corresponding to the original analog speech signal from said analog signals representative of synthesized speech. - View Dependent Claims (10)
-
Specification