Allophone vocoder

US 4,661,915 A
Filed: 08/03/1981
Issued: 04/28/1987
Est. Priority Date: 08/03/1981
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

means for analyzing digital speech data representative of an analog speech signal to generate perceived phonemes representative of component parts of said digital speech data;

memory means having encoded digital speech data stored therein, said encoded digital speech data including phoneme codes representative of a plurality of respective reference phonemes, said memory means further having digital speech data stored therein representative of allophones analogous to said phoneme codes;

means operably coupled to said analyzing means and to said memory means for selecting encoded digital speech data representative of a particular reference phoneme from said memory means as the closest match for each of said perceived phonemes of said digital speech data to provide a phoneme code at least approximating each of said perceived phonemes; and

means operably coupled to said selecting means and said memory means for forming a phoneme code sequence of a plurality of said phoneme codes, said phoneme code sequence-formeing means being responsive to said phoneme codes as determined by said selecting means to access digital speech data from said memory means representative of analogous allophones corresponding to said phoneme codes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An allophone vocoder which utilizes the inherent redundancy of the spoken language together with the automatic human filtering of speech so as to obtain a speech compression and recognition system. An analog speech signal is broken up into its phoneme components and encoded for transmission. The encoded phoneme sequence has a much higher compression rate than the analog speech signal. The phonemes are then either transmitted, stored, or used to generate directly an analogous allophone sequence so as to approximate the original speech signal. Due to the inherent redundancy of the spoken language, and the filtering effect of the human ear, variations or errors in the approximations of the phonemes received from the original speech signal are inconsequential to the comprehension ability of the final allophone synthesized speech.

43 Citations

View as Search Results

10 Claims

1. A speech recognition system comprising:
- means for analyzing digital speech data representative of an analog speech signal to generate perceived phonemes representative of component parts of said digital speech data;
  
  memory means having encoded digital speech data stored therein, said encoded digital speech data including phoneme codes representative of a plurality of respective reference phonemes, said memory means further having digital speech data stored therein representative of allophones analogous to said phoneme codes;
  
  means operably coupled to said analyzing means and to said memory means for selecting encoded digital speech data representative of a particular reference phoneme from said memory means as the closest match for each of said perceived phonemes of said digital speech data to provide a phoneme code at least approximating each of said perceived phonemes; and
  
  means operably coupled to said selecting means and said memory means for forming a phoneme code sequence of a plurality of said phoneme codes, said phoneme code sequence-formeing means being responsive to said phoneme codes as determined by said selecting means to access digital speech data from said memory means representative of analogous allophones corresponding to said phoneme codes.
- View Dependent Claims (2)
- - 2. A speech recognition system as set forth in claim 1, wherein the digital speech data operated upon by said analyzing means is representative of an analog speech signal normalized for pitch and speed such that the allophones represented by the digital speech data as accessed from said memory means by said phoneme code sequence-forming means more nearly approximate the original analog speech signal.

3. A speech recognition and systhesis system comprising:
- means for analyzing digital speech data representative of an analog speech signal to generate perceived phonemes representative of component parts of said digital speech data;
  
  memory means having encoded digital speech data stored therein, said encoded digital speech data including phoneme codes representative of a plurality of respective reference phonemes, said memory means further having digital speech data stored therein representative of allophones analogous to said phoneme codes;
  
  means operably coupled to said analyzing means and to said memory means for selecting encoded digital speech data representative of a particular reference phoneme from said memory means as the closest match for each of said perceived phonemes of said digital speech data to provide a phoneme code at least approximating each of said perceived phonemes;
  
  means operably coupled to said selecting means an said memory means for forming a phoneme code sequence of a plurality of said phoneme codes, said phoneme code sequence-forming means being responsive to said phoneme codes as determined by said selecting means to access digital speech data from said memory means representative of analogous allophones corresponding to said phoneme codes;
  
  speech synthesizer means operably coupled to the output of said phoneme code sequence-forming means for processing the digital speech data representative of allophones provided thereby to generate an analog speech signal; and
  
  audio means coupled to said speech synthesizer means for converting said analog speech signal generated thereby into audible synthesized speech coresponding to the original analog speech signal.
- View Dependent Claims (4, 5)
- - 4. A speech recognition and systhesis system as set forth in claim 3, wherein the digital speech data operated upon by said analyzing means is representative of an analog speech signal normalized for pitch and speed such that the allophones represented by the digital speech data as accessed from said memory means by said phoneme code sequence-forming means more nearly approximate the original analog speech signal.
  - 5. A speech recognition and systhesis system as set forth in claim 4, wherein the digital speech data representative of allophones as stored in said memory means comprises speech parameters including linear predictive coding reflection coefficients, and said speech synthesizer means is a linear predictive coding speech synthesizer means is a linear predictive coding speech synthesizer.

6. A vocoder comprising:
- means for analyzing digital speech data representative of an analog speech signal and identifying phoneme components of said digital speech data;
  
  library means storing digital speech data including encoded digital speech data in the form of phoneme codes representative of a plurality of reference phonemes comprising all of the recognized phonemes in a given spoken language, each of which has an associated set of allophone characteristics corresponding thereto stored as digital speech data in said library means;
  
  comparator means operably coupled to said analyzing means and said library means for obtaining the closest match from said plurality of reference phonemes as represented by the encoded digital speech data stored in said library means to said phoneme components of said digital speech data to provide a phoneme code at least approximating each of said phoneme components of said digital speech data identified by said analyzing means;
  
  means for providing a phoneme code sequence of connected phoneme codes corresponding to the respective reference phomemes from said phoneme codes stored in said library means which are the closest match to said phoneme components of said digital speech data representative of said analog speech signal;
  
  said library means being responsive to said phoneme code sequence to provide a phoneme-to-allophone translation in communicating digital speech data representative of allphones to said phoneme code sequence-forming means;
  
  speech synthesizer means connected to the output of said phoneme code sequence-forming means for processing the digital speech data representative of allophones provided thereby to generate an analog speech signal; and
  
  audio means coupled to said speech synthesizer means for converting said analog speech signal generated thereby into audible synthesized speech corresponding to the original analog speech signal.
- View Dependent Claims (7, 8)
- - 7. A vocoder as set forth in claim 6, wherein the digital speech data operated upon by said analyzing means is representative of an analog speech signal normalized for pitch and speed such that the allophones represented by the digital speech data communicated from said library means to said phoneme code sequence-forming means more nearly approximate the original analog speech signal.
  - 8. A vocoder as set forth in claim 7, wherein the digital speech data stored in said library means and representative of allophones comprises speech parameters including lnear predictive coding reflection coefficients, and said speech synthesizer means is a linear predictive coding speech synthesizer.

9. A method of analyzing a speech signal and producing audible synthesized speech comprising:
- providing an analog speech signal;
  
  identifying phoneme component parts of said analog speech signal;
  
  comparing each of the phoneme component parts as identified from said analog speech signal with a plurality of reference phonemes comprising all of the recognized phonemes in a given spoken language;
  
  obtaining the closest match from said plurality of reference phonemes to each of the identified phoneme component parts of said analog speech signal to provide respective phoneme codes at least approximating each of the identified phoneme component parts;
  
  forming a phoneme code sequence of connected phoneme codes as determined by the matching of the closest reference phoneme to each of the identified phoneme component parts of said analog speech signal;
  
  translating the formed phoneme code sequence into an analogous allophone sequence thereto;
  
  generating analog signals representative of synthesized speech from said allophone sequence; and
  
  producing audible synthesized speech corresponding to the original analog speech signal from said analog signals representative of synthesized speech.
- View Dependent Claims (10)
- - 10. A method as set forth in claim 9, further including normalizing said analog speech signal by setting the pitch and speed thereof in accordance with the voice of a user prior to the identification of said phoneme part components thereof such that the subsequent translation of said phoneme code sequence to said allophone sequence enables the audible synthesized speech produced therefrom to more nearly approximate the original analog speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Ott, Granville E.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/289,604
Time in Patent Office

2,094 Days
Field of Search

179/1 SA, 179/1 SB, 179/1 SC, 179/1 SD, 179/1 SE, 179/1 SC, 364/51 B, 340/146.3 WD, 340/146.3 AQ
US Class Current

704/254
CPC Class Codes

G10L 19/00 Speech or audio signals ana...

G10L 19/0018 Speech coding using phoneti...

Allophone vocoder

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

43 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Allophone vocoder

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links