Method and apparatus for prosody for synthetic speech prosody determination

US 5,796,916 A
Filed: 05/26/1995
Issued: 08/18/1998
Est. Priority Date: 01/21/1993
Status: Expired due to Term

First Claim

Patent Images

1. A method for specifying synthetic speech intonation, comprising the steps of:

(a) obtaining natural pitch and duration values for a natural voicing section of a natural utterance;

(b) obtaining synthetic pitch and duration values for a synthetic voicing section of a synthetic equivalent to the natural utterance;

(c) aligning the natural voicing section to the synthetic voicing section; and

(d) replacing the synthetic pitch and duration values of the synthetic voicing section with the natural pitch and duration values.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a synthetic speech system intonation of a natural utterance is automatically applied to a synthesized utterance. The present invention applies the desired intonation of the natural utterance to the synthesized utterance by aligning voicing sections of the natural utterance to the synthesized utterance. The voicing sections are initially delineated by voiced versus unvoiced, based on default voicing specifications for the synthetic utterance and on pitch tracker analysis of the natural utterance, and an attempt is made to align individual sections thereby. If no initial alignment occurs then a further attempt is made by varying the default voicing specifications of the synthesized utterance. If alignment is still not achieved, then each of the utterances, natural and synthetic, is considered a single large voicing section, which thus forces alignment therebetween. Once alignment occurs, the intonation of the natural utterance is applied to the synthetic utterance thereby providing the synthetic utterance with the desired, more natural, intonation. Further, the synthetic utterance having intonation specification can be graphically displayed so that the user may view and interactively and graphically modify the intonation specification for the synthetic utterance.

74 Citations

View as Search Results

24 Claims

1. A method for specifying synthetic speech intonation, comprising the steps of:
- (a) obtaining natural pitch and duration values for a natural voicing section of a natural utterance;
  
  (b) obtaining synthetic pitch and duration values for a synthetic voicing section of a synthetic equivalent to the natural utterance;
  
  (c) aligning the natural voicing section to the synthetic voicing section; and
  
  (d) replacing the synthetic pitch and duration values of the synthetic voicing section with the natural pitch and duration values.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein step (a) comprises using a pitch tracker to take pitch measurements of the natural utterance over n pitch periods.
  - 3. The method of claim 2 wherein step (a) further comprises interpolating pitch measurements between voiced portions of the natural voicing section.
  - 4. The method of claim 1 wherein step (b) comprises retrieving predetermined phonetic duration and pitch values from a look-up table.
  - 5. The method of claim 1 wherein step (c) comprises sequentially aligning alternating voiced and unvoiced types of the natural voicing section to alternating voiced and unvoiced types of the synthetic voicing section.
  - 6. The method of claim 1 wherein step (c) comprises:
    - i) varying voicing possibilities for the synthetic voicing section until one or more alignments are reached between alternating voiced and unvoiced types of the synthetic voicing section and alternating voiced and unvoiced types of the natural voicing section; and
      
      ii) sequentially aligning the alternating voiced and unvoiced types of the natural voicing section to the alternating voiced and unvoiced types of the synthetic voicing section until a best reached alignment is achieved.
  - 7. The method of claim 6 wherein the best reached alignment is the alignment with a:
    - i) lowest accumulated error between the natural voicing section and the synthetic voicing section;
      
      ii) fewest variable voicing possibilities actually varied; and
      
      iii) fewest natural voicing sections which fall outside a predetermined duration range.

8. An apparatus for intonation specification comprising:
- (a) means for obtaining natural pitch and duration values for a natural voicing section of a natural utterance;
  
  (b) means for obtaining synthetic pitch and duration values for a synthetic voicing section of a synthetic equivalent to the natural utterance;
  
  (c) means for aligning the natural voicing section to the synthetic voicing section; and
  
  (d) means for substituting the natural pitch and duration values of the natural voicing section for the synthetic pitch and duration values.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8 wherein element (a) comprises a pitch tracker capable of taking pitch measurements of the natural utterance over n pitch periods.
  - 10. The apparatus of claim 9 wherein element (a) further comprises means for interpolating pitch measurements between voiced portions of the natural voicing section.
  - 11. The apparatus of claim 8 wherein element (b) comprises a look-up table of predetermined phonetic duration and pitch values.
  - 12. The apparatus of claim 8 wherein element (c) comprises means for sequentially aligning alternating voiced and unvoiced types of the natural voicing section to alternating voiced and unvoiced types of the synthetic voicing section.
  - 13. The method of claim 8 wherein step (c) comprises:
    - i) means for varying voicing possibilities for the synthetic voicing section until one or more alignments are reached between sequentially voiced and unvoiced types of the synthetic voicing section and alternating voiced and unvoiced types of the natural voicing section; and
      
      ii) means for sequentially aligning alternating voiced and unvoiced types of the natural voicing section to alternating voiced and unvoiced types of the synthetic voicing section until a best reached alignment is achieved.
  - 14. The apparatus of claim 13 wherein the best reached alignment is the alignment with a:
    - i) lowest accumulated error between the natural voicing section and the synthetic voicing section;
      
      ii) fewest variable voicing possibilities actually varied; and
      
      iii) fewest natural voicing sections which fall outside a predetermined duration range.

15. A method for intonation specification comprising the following steps:
- a) obtaining natural voiced pitch and duration values for a natural voiced portion of a natural utterance;
  
  b) obtaining natural unvoiced pitch and duration values for a natural unvoiced portion of the natural utterance;
  
  c) obtaining synthetic voiced and unvoiced pitch and duration values for synthetic voiced and unvoiced portions of a synthetic equivalent to the natural utterance;
  
  d) aligning the natural voiced and unvoiced portion to the synthetic voiced and unvoiced portions; and
  
  e) substituting the natural voiced and unvoiced pitch and duration values for the synthetic voiced and unvoiced pitch and duration values.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The method of claim 15 wherein step (a) comprises using a pitch tracker to take pitch measurements of the natural utterance over n pitch periods.
  - 17. The method of claim 15 wherein the natural utterance includes multiple natural voiced portions, and step (b) comprises interpolating pitch measurements between the natural voiced portions.
  - 18. The method of claim 15 wherein step (c) uses a look-up to a table of a set of predetermined phonetic duration and pitch values.
  - 19. The method of claim 15 wherein step (d) comprises sequentially aligning alternating natural voiced and unvoiced portions to alternating synthetic voiced and unvoiced portions.
  - 20. The method of claim 15 wherein step (d) comprises:
    - i) varying voicing possibilities of the synthetic voiced and unvoiced portions until one or more alignments are reached between the alternating synthetic voiced and unvoiced portions and the alternating natural voiced and unvoiced portions; and
      
      ii) sequentially aligning the alternating natural voiced and unvoiced portions to the alternating synthetic voiced and unvoiced portions until a best reached alignment is achieved.
  - 21. The method of claim 20 wherein the best reached alignment is the alignment with a:
    - i) lowest accumulated error between the natural voiced and unvoiced portions and the synthetic voiced and unvoiced portions;
      
      ii) fewest variable voicing possibilities actually varied;
      
      iii) fewest natural voiced portions which fall outside a predetermined duration range.

22. A method for intonation specification in a synthetic speech system comprising the following steps:
- a) obtaining a set of pitch and duration values of one or more voicing sections of a natural utterance;
  
  b) obtaining a set of pitch and duration values of one or more voicing sections of a synthetic equivalent to the natural utterance;
  
  c) aligning the one or more voicing sections of the natural utterance to the one or more voicing sections of the synthetic equivalent to the natural utterance, including the steps ofi) varying voicing possibilities of the one or more voicing sections of the synthetic equivalent to the natural utterance until one or more alignments are reached between sequentially voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance and alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance; and
  
  ii) sequentially aligning alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance to alternating voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance for the best reached alignment between sequentially voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance and alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance, the best reached alignment being the alignment with thei) lowest accumulated error between the one or more voicing sections of the natural utterance and the one or more voicing sections of the synthetic equivalent to the natural utterance;
  
  ii) fewest voicing possibilities actually varied; and
  
  iii) fewest of the one or more voicing sections of the natural utterance which fell outside a predetermined duration range; and
  
  d) substituting the pitch and duration values of the one or more voicing sections of the natural utterance for the pitch and duration values of the one or more voicing sections of the synthetic equivalent to the natural utterance.

23. An apparatus for intonation specification in a synthetic speech system comprising:
- a) means for obtaining a set of pitch and duration values of one or more voicing sections of a natural utterance;
  
  b) means for obtaining a set of pitch and duration values of one or more voicing sections of a synthetic equivalent to the natural utterance;
  
  c) means for aligning the one or more voicing sections of the natural utterance to the one or more voicing sections of the synthetic equivalent to the natural utterance, the means for aligning includingi) means for varying voicing possibilities of the one or more voicing sections of the synthetic equivalent to the natural utterance until one or more alignments are reached between sequentially voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance and alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance; and
  
  ii) means for sequentially aligning alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance to alternating voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance for the best reached alignment between sequentially voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance and alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance, wherein the best reached alignment is the alignment with thei) lowest accumulated error between the one or more voicing sections of the natural utterance and the one or more voicing sections of the synthetic equivalent to the natural utterance;
  
  ii) fewest voicing possibilities actually varied; and
  
  iii) fewest of the one or more voicing sections of the natural utterance which fell outside a predetermined duration range; and
  
  d) means for substituting the pitch and duration values of the one or more voicing sections of the natural utterance for the pitch and duration values of the one or more voicing sections of the synthetic equivalent to the natural utterance.

24. A method for intonation specification in a synthetic speech system comprising the following steps:
- a) obtaining a set of pitch and duration values of one or more voiced portions of a natural utterance;
  
  b) obtaining a set of pitch and duration values of one or more unvoiced portions of a natural utterance;
  
  c) obtaining a set of pitch and duration values of one or more voiced and one or more unvoiced portions of a synthetic equivalent to the natural utterance;
  
  d) aligning the one or more voiced portions of the natural utterance to the one or more voiced and unvoiced portions of the synthetic equivalent to the natural utterance, the step of aligning includingi) varying voicing possibilities of the one or more voicing sections of the synthetic equivalent to the natural utterance until one or more alignments are reached between sequentially voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance and alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance; and
  
  ii) sequentially aligning alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance to alternating voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance for the best reached alignment between sequentially voiced and unvoiced types of the one or more voicing sections of the synthetic equivalent to the natural utterance and alternating voiced and unvoiced types of the one or more voicing sections of the natural utterance, the best reached alignment being the alignment with thei) lowest accumulated error between the one or more voicing sections of the natural utterance and the one or more voicing sections of the synthetic equivalent to the natural utterance;
  
  ii) fewest voicing possibilities actually varied; and
  
  iii) fewest of the one or more voicing sections of the natural utterance which fell outside a predetermined duration range; and
  
  e) substituting the pitch and duration values of the one or more voiced portions of the natural utterance for the pitch and duration values of the one or more voiced and unvoiced portions of the synthetic equivalent to the natural utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Computer Incorporated (Apple Inc.)
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Meredith, Scott E.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
MATTSON, ROBERT

Application Number

US08/451,617
Time in Patent Office

1,180 Days
Field of Search

395/2, 395/2.1, 395/2.14, 395/2.15, 395/2.16, 395/2.2, 395/2.67, 395/2.69, 395/2.76, 395/2.77, 395/2.75, 395/2.79, 395/2.85, 395/2.87, 381/51-53
US Class Current

704/258
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

Method and apparatus for prosody for synthetic speech prosody determination

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

74 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for prosody for synthetic speech prosody determination

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

74 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links