Method and apparatus for improved duration modeling of phonemes

US 6,553,344 B2
Filed: 02/22/2002
Issued: 04/22/2003
Est. Priority Date: 12/18/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for modeling phoneme durations comprising:

calculating durations for a phoneme using a generalized additive model that incorporates influences of contextual factors on the durations, the generalized additive model including a functional transformation that describes a shape containing an inflection point.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients is applied to at least one phoneme of the received text resulting in the generation of at least one phoneme having a duration. An acoustic sequence is generated comprising speech signals that are representative of the received text.

260 Citations

24 Claims

1. A method for modeling phoneme durations comprising:
- calculating durations for a phoneme using a generalized additive model that incorporates influences of contextual factors on the durations, the generalized additive model including a functional transformation that describes a shape containing an inflection point.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 further comprising:
3. The method of claim 1, wherein control parameters for the functional transformation define a location on the shape for the inflection point and a slope of the shape at the inflection point.
4. The method of claim 3 further comprising:
- determining the control parameters by applying an inverse of the functional transformation to durations of the phoneme appearing in training data.
5. The method of claim 1, wherein the functional transformation comprises a root sinusoidal transformation.
6. The method of claim 5, wherein the functional transformation comprises:
- $F (x) = {{\frac{B - A}{2} [\cos (π \frac{x - A}{B - A})]}^{α} + \frac{A + B}{2}}^{β}$ wherein x is a duration for the phoneme, A is a minimum duration for the phoneme, B is a maximum duration for the phoneme, α
  
  controls a slope of the shape at the inflection point, and β
  
  controls a location on the shape of the inflection point.

7. A computer-readable medium having executable instructions to cause a computer to perform a method comprising:
- calculating durations for a phoneme using a generalized additive model that incorporates influences of contextual factors on the durations, the generalized additive model including a functional transformation that describes a shape containing an inflection point.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer-readable medium of claim 7, wherein the method further comprises:
9. The computer-readable medium of claim 7, wherein control parameters for the functional transformation define a location on the shape for the inflection point and a slope of the shape at the inflection point.
10. The computer-readable medium of claim 9, wherein the method further comprises:
- determining the control parameters by applying an inverse of the functional transformation to durations of the phoneme appearing in training data.
11. The computer-readable medium of claim 7, wherein the functional transformation comprises a root sinusoidal transformation.
12. The computer-readable medium of claim 11, wherein the functional transformation comprises:
- $F (x) = {{\frac{B - A}{2} [\cos (π \frac{x - A}{B - A})]}^{α} + \frac{A + B}{2}}^{β}$ wherein x is a duration for the phoneme, A is a minimum duration for the phoneme, B is a maximum duration for the phoneme, α
  
  controls a slope of the shape at the inflection point, and β
  
  controls a location on the shape of the inflection point.

13. A system comprising:
- a processor coupled to a memory through a bus; and
  
  a process executed from the memory by the processor to cause the processor to calculate durations for a phoneme using a generalized additive model that incorporates influences of contextual factors on the durations, the generalized additive model including a functional transformation that describes a shape containing an inflection point.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The system of claim 13, wherein the process further causes the processor to measure durations of the phoneme appearing in training data to identify a duration range for the functional transformation.
  - 15. The system of claim 13, wherein control parameters for the functional transformation define a location on the shape for the inflection point and a slope of the shape at the inflection point.
  - 16. The system of claim 15, wherein the process further causes the processor to determine the control parameters by applying an inverse of the functional transformation to durations of the phoneme appearing in training data.
  - 17. The system of claim 13, wherein the functional transformation comprises a root sinusoidal transformation.
  - 18. The system of claim 17, wherein the functional transformation comprises:
    - $F (x) = {{\frac{B - A}{2} [\cos (π \frac{x - A}{B - A})]}^{α} + \frac{A + B}{2}}^{β}$

19. An apparatus comprising:
- means for calculating durations for a phoneme using a generalized additive model that incorporates influences of contextual factors on the durations, the generalized additive model including a functional transformation that describes a shape containing an inflection point.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The apparatus of claim 19 further comprising:
21. The apparatus of claim 19, wherein control parameters for the functional transformation define a location on the shape for the inflection point and a slope of the shape at the inflection point.
22. The apparatus of claim 21 further comprising:
- means for determining the control parameters by applying an inverse of the functional transformation to durations of the phoneme appearing in training data.
23. The apparatus of claim 21, wherein the functional transformation comprises a root sinusoidal transformation.
24. The apparatus of claim 23, wherein the functional transformation comprises:
- $F (x) = {{\frac{B - A}{2} [\cos (π \frac{x - A}{B - A})]}^{α} + \frac{A + B}{2}}^{β}$ wherein x is a duration for the phoneme, A is a minimum duration for the phoneme, B is a maximum duration for the phoneme, α
  
  controls a slope of the shape at the inflection point, and β
  
  controls a location on the shape of the inflection point.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Bellegarda, Jerome R., Silverman, Kim
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Lerner, Martin

Application Number

US10/082,438
Publication Number

US 20020138270A1
Time in Patent Office

424 Days
Field of Search

704/258, 704/266, 704/267, 704/269, 704/236, 704/211
US Class Current

704/267
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

Method and apparatus for improved duration modeling of phonemes

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

260 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for improved duration modeling of phonemes

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

260 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links