Voice source for synthetic speech system

US 5,400,434 A
Filed: 04/18/1994
Issued: 03/21/1995
Est. Priority Date: 09/04/1990
Status: Expired due to Term

First Claim

Patent Images

1. In a synthetic voice generating system, the improvement therein comprising:

a plurality of glottal pulses, each glottal pulse having a different desired frequency and being a selected portion of a speech waveform, said speech waveform being created by measuring sound pressures of a human spoken sound at successive sample points in time and inverse-filtering the measurements to remove vocal tract components;

storage means for storing said plurality of glottal pulses; and

means for utilizing said plurality of glottal pulses to generate a synthetic voice signal.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The voice source for the synthetic speech system is human generated speech waveforms that are inverse filtered to produce glottal waveforms representing larynx sound. These glottal waveforms are modified in pitch and amplitude, as required, to produce the desired sound. The human quality of the synthetically generated voice is further brought out by adding vocal tract effects, as desired. The pitch control is effected in one of two alternate ways, a loop method, or a concatenation method.

Citations

47 Claims

1. In a synthetic voice generating system, the improvement therein comprising:
- a plurality of glottal pulses, each glottal pulse having a different desired frequency and being a selected portion of a speech waveform, said speech waveform being created by measuring sound pressures of a human spoken sound at successive sample points in time and inverse-filtering the measurements to remove vocal tract components;
  
  storage means for storing said plurality of glottal pulses; and
  
  means for utilizing said plurality of glottal pulses to generate a synthetic voice signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The improvement in said synthetic voice generating system of claim 1 wherein said storage means comprises:
    - a memory look-up table containing a plurality of sample points for each one of said glottal pulses.
  - 3. The improvement in said synthetic voice generating system of claim 2 wherein said means for utilizing comprises:
    - pitch control means for modifying said glottal pulses to vary the pitch of the glottal pulses, said glottal pulses being modified by uniformly interpolating between sample points of said glottal pulses to produce a modified glottal pulse having more or fewer sample points.
  - 4. The improvement in said synthetic voice generating system of claim 3 wherein said means for utilizing further comprises:
    - amplitude control means for increasing or decreasing the amplitude of the time-domain glottal pulses modified by said pitch control means.
  - 5. The improvement in said synthetic voice generating system of claim 1 wherein said storage means comprises:
    - a memory means for storing a plurality of glottal pulses in time-domain form, each glottal pulse having therefor a different pitch period.
  - 6. The improvements in said synthetic voice generating system of claim 5 wherein said means for utilizing comprises:
    - pitch control means for selecting a particular sequence of glottal pulses and concatenating them together.
  - 7. The improvements in said synthetic voice generating system of claim 6 wherein said means for utilizing further comprises:
    - amplitude control means for increasing or decreasing the amplitude of the time-domain glottal pulses concatenated by said pitch control means.

8. In a synthetic voice generating system, the improvement therein comprising:
- a plurality of glottal pulses stored in a storage means, each glottal pulse having a desired frequency and being a selected portion of a speech waveform, said speech waveform being created by measuring sound pressures of a human spoken sound at successive sample points in time and inverse-filtering the measurements to remove vocal tract components;
  
  a voice source means for generating a signal representing the sound produced by a human larynx by combining a plurality of said stored glottal pulses; and
  
  a vocal tract simulating means for modifying the signals from said voice source means to simulate the effect of a human vocal tract on said voice source signals.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 9. The improvement of claim 8 wherein said vocal tract simulating means comprises:
    - a cascade of second order digital filters.
  - 10. The improvement of claim 9 wherein besides said voice source signal, said digital filters receive signals from a noise source means which generates signals representing air turbulence in the voice tract.
  - 11. The improvement of claim 10 wherein said noise source means comprises:
    - an aspiration source means for generating signals representing air turbulence at the vocal cords; and
      
      a frication source means using frications from real speech for generating signals representing air turbulence in vocal cavities of the pharynx, mouth and nose.
  - 12. The improvements of claim 8 wherein the voice source means comprises:
    - storage means for storing a plurality of different time domain glottal pulses derived from a human source; and
      
      means for utilizing the glottal pulses in said storage means to generate a synthetic voice signal.
  - 13. The improvement of claim 12 wherein said storage means comprises:
    - a plurality of memory look-up tables, each table containing a plurality of sample points representing a small group of glottal pulses, in code form.
  - 14. The improvement of claim 13 wherein said utilizing means comprises:
    - means for cross-fading between a departing memory look-up table and in entering memory look-up table according to the relation;
      
      space="preserve" listing-type="equation">S.P.=A X.sub.n +B Y.sub.nwherein A and B are fractions that total 1, X_n is a sample point near the end of the departing look-up table, Y_n is a sample point near the beginning of the entry look-up table, and S.P. is the resulting sample point.
  - 15. The improvement of claim 12 wherein said storage means comprises:
    - a memory look-up table containing a plurality of sample points for each one of said time domain glottal pulses.
  - 16. The improvement of claim 15 wherein said utilizing means comprises:
    - pitch control means for modifying said glottal pulses by varying the pitch period of each glottal pulse by uniformly interpolating between the sample points of a selected glottal pulse to produce a modified glottal pulse having more sample points.
  - 17. The improvement of claim 16 wherein said utilizing means further comprises:
    - amplitude control means for increasing or decreasing the amplitude of the time-domain glottal pulses modified by said pitch control means.
  - 18. The improvement of claim 17 wherein said vocal tract simulating means comprises a cascade of second order digital filters.
  - 19. The improvement of claim 18 wherein besides said voice source signal, said digital filters receive signals from a noise source means which generates signals representing air turbulence in the voice tract.
  - 20. The improvement of claim 19 wherein said one noise source means comprises:
    - an aspiration source means for generating signals representing air turbulence at the vocal cords; and
      
      a frication source means using frications from real speech for generating signals representing air turbulence in vocal cavities of the pharynx, mouth and nose.
  - 21. The improvement of claim 12 wherein said storage means comprises:
    - a memory means for storing a plurality of glottal pulses in time-domain form, each glottal pulse having a different pitch period.
  - 22. The improvement of claim 21 wherein said utilizing means comprises:
    - pitch control means for selecting a particular sequence of glottal pulses and concatenating them together.
  - 23. The improvement of claim 22 wherein said utilizing means further comprises:
    - means for cross-fading between an ending glottal pulse and a beginning glottal pulse to be concatenated together, according to the relation;
      
      space="preserve" listing-type="equation">S.P.=A X.sub.n +B Y.sub.nwherein A and B are fractions that always total 1, X_n is a point on the ending glottal pulse to be joined to the beginning glottal pulse, Y_n is a point on the beginning glottal pulse, and S.P. is the resulting sample point which is a combination of the ending glottal pulse and the beginning glottal pulse.
  - 24. The improvement of claim 22 wherein said means for utilizing further comprises:
    - amplitude control means for increasing or decreasing the amplitude of the glottal pulses concatenated by said pitch control means.
  - 25. The improvement of claim 24 wherein said vocal tract simulating means comprises a cascade of second order digital filters.
  - 26. The improvement of claim 25 wherein besides said voice source signal, said digital filters receive signals from a noise source means which generates signals representing air turbulence in the voice tract.
  - 27. The improvement of claim 26 wherein said one noise source means comprises:
    - an aspiration source means for generating signals representing air turbulence at the vocal cords; and
      
      a frication source means using frications from real speech for generating signals representing air turbulence in vocal cavities of the pharynx, mouth and nose.
  - 28. The improvement of claim 12 wherein said storage means comprises:
    - a memory means for storing a plurality of glottal pulses in code form.
  - 29. The improvement of claim 28 wherein said utilizing means comprises:
    - pitch control means for selecting a particular sequence of glottal pulses and concatenating them together.
  - 30. The improvement of claim 29 further comprising an address look-up table for said memory means, said address look-up table providing addresses to certain glottal pulses stored in said memory means in response to the parameters of period and amplitude.
  - 31. The method of claim 30, further comprising, after said measuring step, the step of filtering the measured human speech sounds by an antialias filter.
  - 32. The improvement of claim 29 wherein said memory means stores the addresses of a plurality of other possible neighbor glottal pulses along with each glottal pulse stored, whereby only the neighbor glottal pulses are selected for concatenating with said stored glottal pulse.
  - 33. The improvement of claim 32 wherein said utilizing means further comprises:
    - means for cross-fading between a selected ending glottal pulse and a selected beginning glottal pulse to be concatenated together, according to the relation;
      
      space="preserve" listing-type="equation">S.P.=A X.sub.n +B Y.sub.nwherein A and B are functions that always total 1, X_n is a point on the ending glottal pulse, Y_n is a point on the beginning glottal pulse, and S.P. is the resulting sample point which is a combination of the ending and beginning glottal pulses.
  - 34. The improvement of claim 29 wherein said memory means stores the address of one other glottal pulse along with each glottal pulse stored, effectively providing a list of glottal pulses, whereby the stored glottal pulses and the list of glottal pulses are examined to determine which one best meets the requirement.
  - 35. The improvement of claim 34 wherein said utilizing means further comprises:
    - means for cross-fading between a selected ending glottal pulse and a selected beginning glottal pulse to be concatenated together, according to the relation;
      
      space="preserve" listing-type="equation">S.P.=A X.sub.n +B Y.sub.nwherein A and B are fractions that always total 1, X_n is a point on the ending glottal pulse, Y_n is a point on the beginning glottal pulse, and S.P. is the resulting sample point which is a combination of the starting and beginning glottal pulses.
  - 36. The improvement of claim 29 further comprising an address look-up table for said memory means, said address look-up table providing addresses to certain glottal pulses stored in said memory means in response to the parameters of period, amplitude, and phoneme.

37. In a synthetic voice generating system, the improvement therein comprising:
- a plurality of glottal pulses said glottal pulses having different desired frequencies and being a selected portion of an inverse-filtered human speech waveform;
  
  storage means for storing said glottal pulses;
  
  means for retrieving said glottal pulses from said storage means; and
  
  means for applying said glottal pulses to a synthesis filter to generate a synthetic voice signal.
- View Dependent Claims (38, 39)
- - 38. The improved synthetic noise generating system of claim 37 wherein said speech waveform is created by measuring the sound pressure of a human spoken sound at successive points in time.
  - 39. The improved synthetic voice generating system of claim 38 wherein said vocal tract components are removed by inverse filtering.

40. In a synthetic voice generating system, the improvement comprising:
- a plurality of stored glottal pulses, each stored glottal pulse having a desired frequency and being a selected portion of a speech waveform, said speech waveform created by measuring sound pressures of a human spoken sound at successive sample points in time and inverse-filtering the measurements to remove vocal tract components;
  
  a noise source means for generating a signal representing the sound produced by a human larynx by combining a plurality of said stored glottal pulses; and
  
  a vocal tract simulating means for modifying the signals from said noise source means to simulate the effect of a human vocal tract on said noise source signals.
- View Dependent Claims (41, 42)
- - 41. The improved synthetic noise generating system of claim 40 wherein said speech waveform is created by measuring the sound pressure of a human spoken sound at successive points in time.
  - 42. The improved synthetic voice generating system of claim 40 wherein said vocal tract components are removed by inverse filtering.

43. In a synthetic voice generating system, the improvement therein comprising:
- a plurality of glottal pulses in a storage means, said pulses comprising portions of glottal waveforms generated by inverse filtering time-domain representations of human speech with a plurality of second-order, finite-impulse-response filters with zeros chosen to cancel human vocal tract resonance components therefrom, each of said plurality of glottal pulses having a desired frequency and including frequency domain and time domain characteristics of human speech;
  
  pitch control means for receiving said plurality of glottal pulses and generating pitch-modified glottal pulses;
  
  amplitude control means for receiving said pitch-modified glottal pulses and increasing or decreasing an amplitude of said pitch-modified glottal pulses to generate amplitude-modified glottal pulses; and
  
  vocal tract simulating means for modifying said amplitude-modified glottal pulses received from said amplitude control means to simulate human vocal tract resonances on said amplitude-modified glottal pulses.

44. A method of generating speech comprising the steps of:
- extracting glottal pulses from speech, each glottal pulse having a different frequency;
  
  storing said glottal pulses in a memory;
  
  reading said glottal pulses from said memory; and
  
  applying the glottal pulses read from memory to a synthesis filter for outputting speech.
- View Dependent Claims (45)
- - 45. The method of generating speech according to claim 44, wherein the step of storing the glottal pulses includes a step of storing at least one glottal pulse for each desired frequency.

46. A method of generating synthetic speech having various pitches from inverse-filtered speech waveforms, comprising the following steps:
- reading a first glottal pulse from a memory containing a plurality of glottal pulses, each stored glottal pulse having a different period, said first glottal pulse having a first period that corresponds to a first desired pitch;
  
  reading a second glottal pulse from said memory, said second glottal pulse having a second period that corresponds to a second desired pitch;
  
  concatenating the two glottal pulses to form a resulting waveform; and
  
  applying the resulting waveform to a synthesis filter to generate speech with varying pitch.
- View Dependent Claims (47)
- - 47. The method of generating synthetic speech according to claim 46, wherein the step of concatenating the two glottal pulses includes the step of segmenting the two glottal pulses at zero crossings and joining the two pulses at the segmentation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Pearson, Steve
Primary Examiner(s)
Knepper, David D.

Application Number

US08/228,954
Time in Patent Office

337 Days
Field of Search

395/2, 395/2.67, 395/2.7, 395/2.73, 395/2.76, 395/2.77, 395/2.75, 381/51-53
US Class Current

704/264
CPC Class Codes

G10L 13/06 Elementary speech units use...

Voice source for synthetic speech system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

Voice source for synthetic speech system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links