System and method for compressing concatenative acoustic inventories for speech synthesis

US 20030212555A1
Filed: 05/09/2002
Published: 11/13/2003
Est. Priority Date: 05/09/2002
Status: Active Grant

First Claim

Patent Images

1. A method for compressing concatenative acoustic inventories for speech synthesis, comprising:

creating an acoustic inventory comprising a plurality of natural speech intervals;

determining a set of peak components for each basis vector in the plurality of natural speech intervals;

determining start and end vectors for the plurality of natural speech intervals;

defining a mapping between a first peak index set associated with the start vector and a second peak index set associated with the end vector such that respective peak points in the first peak index set and the second peak index set are associated with each other;

creating an extended mapping based on the mapping between the first peak index set and the second peak index set;

performing a comparison between a complete morph mapping and a peak morph mapping to determine whether an index is located within the first peak index set;

creating a sequence of approximation vectors based on the complete morph mapping;

determining a time warp function and a corresponding vector in a sequence vector which is proximal to the sequence of approximation vectors;

parameterizing the time warp function by way of a first straight line and a second straight line to approximate a curve which extends through a predetermined spaced; and

storing the parameters index function and names of the acoustic units.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.

Citations

18 Claims

1. A method for compressing concatenative acoustic inventories for speech synthesis, comprising:
- creating an acoustic inventory comprising a plurality of natural speech intervals;
  
  determining a set of peak components for each basis vector in the plurality of natural speech intervals;
  
  determining start and end vectors for the plurality of natural speech intervals;
  
  defining a mapping between a first peak index set associated with the start vector and a second peak index set associated with the end vector such that respective peak points in the first peak index set and the second peak index set are associated with each other;
  
  creating an extended mapping based on the mapping between the first peak index set and the second peak index set;
  
  performing a comparison between a complete morph mapping and a peak morph mapping to determine whether an index is located within the first peak index set;
  
  creating a sequence of approximation vectors based on the complete morph mapping;
  
  determining a time warp function and a corresponding vector in a sequence vector which is proximal to the sequence of approximation vectors;
  
  parameterizing the time warp function by way of a first straight line and a second straight line to approximate a curve which extends through a predetermined spaced; and
  
  storing the parameters index function and names of the acoustic units.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 16, 17)
- - 2. The method of claim 1, further comprising the steps of:
    - determining a next higher index and a next lower index which are each located within the first peak index set; and
      
      performing an interpolation between peak morph mapping values to obtain the complete morph mapping.
  - 3. The method of claim 1, wherein the plurality of natural speech intervals are sequences of vectors in a vector space.
  - 4. The method of claim 3, wherein the vector space is an acoustic space.
  - 5. The method of claim 2, wherein the vector space comprises a 128 point power spectra.
  - 6. The method of claim 1, wherein the basis vectors are associated with one of phonemes and allophones in the plurality of natural speech intervals.
  - 7. The method of claim 1, wherein the extended mapping ranges from a first parameter to a second parameter.
  - 8. The method of claim 7, wherein the first parameter and the second parameter range from 1 to 128, respectively.
  - 9. The method of claim 1, wherein said step of creating a sequence of approximation vectors is performed in accordance with the relationships:
    - M_t[i]=(T−
      
      t)/T*i+t/T*M(i),andV(T) whose M_t[i]-th component is (T−
      
      t)/T*x[i]+t/T*y[M(i)].
  - 10. The method of claim 9, where M_t[i] is rounded to the nearest integer between 1 and 128, for each time frame t=0, . . . , T, and T is the number of time frames within the plurality of natural speech intervals.
  - 11. The method of claim 1, wherein a starting point for parameterizing the time warp function is located such that one line extends from a first point to a second point and another line extends from the second point to another point.
  - 15. The method of claim 1, wherein the speech intervals are sequences of vectors in a vector space.
  - 16. The method of claim 15, wherein the vector space is an acoustic space.
  - 17. The method of claim 16, wherein the vector space comprises a 128 point power spectra.

12. A system for compressing concatenative acoustic inventories for speech synthesis, comprising:
- an acoustic element retrieval processor, said processor creating an acoustic inventory comprising a plurality of natural speech intervals received from an acoustic element database;
  
  an element processing and concatenation processor;
  
  said element processor performing the steps of;
  
  determining a set of peak components for each basis vector in the plurality of natural speech intervals;
  
  determining start and end vectors for each basis vector in the natural speech intervals;
  
  defining a mapping between a first peak index set associated with the start vector and a second peak index set associated with the end vector such that respective peak points in the first peak index set and the second peak index set are associated with each other;
  
  creating an extended mapping based on the mapping between the first peak index set and the second peak index set;
  
  performing a comparison between a complete morph mapping and a peak morph mapping to determine whether an index is located within the first peak index set;
  
  creating a sequence of approximation vectors based on the complete morph mapping;
  
  determining a time warp function and a corresponding vector in a sequence vector which is proximal to the sequence of approximation vectors; and
  
  parameterizing the time warp function by way of a first straight line and a second straight line to approximate a curve which extends through a predetermined spaced; and
  
  an acoustic storage device for storing the parameters index function and names of the acoustic units.

13. A method for compressing concatenative acoustic inventories for speech synthesis, comprising:
- determining a set of phonemes;
  
  determining for each phoneme a set of at least one phones, said set of at least one phones comprising at least one of phonemes which may occur as neighbors of said phoneme in a speech synthesis output and contextual descriptors;
  
  determining an inventory specification comprising a plurality of specifications of a phone sequence which is required by a synthesis input domain;
  
  obtaining a set of human speech recordings containing speech intervals which correspond to sequences of phones which include all phone sequences in the inventory specification;
  
  obtaining a parametric representation of the speech intervals which are obtained such that each speech interval is represented as a trajectory through an acoustic parameter space;
  
  for each phone, obtaining at least one basis vector in the acoustic parameter space from stored trajectories such that one of an initial and final vector of a trajectory of each speech interval is approximated by a corresponding basis vector;
  
  said speech interval having corresponding phone sequences that include a phone in one of an initial and final position;
  
  approximating each stored trajectory by a time varying mathematical combination of basis vectors for a phone which is associated with a stored trajectory to generate approximate trajectories; and
  
  constraining the approximate trajectories such that all approximate trajectories that correspond to acoustic units which start or terminate with a given phone posses substantially identical initial or final frames.
- View Dependent Claims (14, 18)
- - 14. The method of claim 13, wherein the textual contextual descriptors are one of lexical stress and location in a speech phrase.
  - 18. The method of claim 13, wherein the at least one basis vector is associated with one of phonemes and allophones in the speech intervals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oregon Health & Science University
Original Assignee
Oregon Health & Science
Inventors
van Santen, Jan P.H.

Granted Patent

US 7,010,488 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/241
CPC Class Codes

G10L 13/04 Details of speech synthesis...

System and method for compressing concatenative acoustic inventories for speech synthesis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for compressing concatenative acoustic inventories for speech synthesis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links