Speech synthesizer having an acoustic element database

US 5,751,907 A
Filed: 08/16/1995
Issued: 05/12/1998
Est. Priority Date: 08/16/1995
Status: Expired due to Term

First Claim

Patent Images

1. A method for producing synthesized speech, the method including an acoustic element database containing acoustic elements that are concatenated to produce synthesized speech, the acoustic element database established by the steps comprising:

for at least one phoneme corresponding to particular phonetic segments contained in a plurality of phonetic sequences occurring in an interval of a speech signal,determining a relative positioning of a tolerance region within a representational space based on a concentration of trajectories of the phonetic sequences that correspond to different phoneme sequences which intersect the region, wherein each trajectory represents an acoustic characteristic of at least a part of a respective phonetic sequence that contains the particular phonetic segment; and

forming acoustic elements from the phonetic sequences by identifying cut points in the phonetic sequences at respective time points along the corresponding trajectories based on the proximity of the time points to the tolerance region.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis method employs an acoustic element database that is established from phonetic sequences occurring in an interval of a speech signal. In establishing the database, trajectories are determined for each of the phonetic sequences containing a phonetic segment that corresponds to a particular phoneme. A tolerance region is then identified based on a concentration of trajectories that correspond to different phoneme sequences. The acoustic elements for the database are formed from portions of the phonetic sequences by identifying cut points in the phonetic sequences which correspond to time points along the respective trajectories proximate the tolerance region. In this manner, it is possible to concatenate the acoustic elements having a common junction phonemes such that perceptible discontinuities at the junction phonemes are minimized. Computationally simple and fast methods for determining the tolerance region are also disclosed.

Citations

22 Claims

1. A method for producing synthesized speech, the method including an acoustic element database containing acoustic elements that are concatenated to produce synthesized speech, the acoustic element database established by the steps comprising:
- for at least one phoneme corresponding to particular phonetic segments contained in a plurality of phonetic sequences occurring in an interval of a speech signal,determining a relative positioning of a tolerance region within a representational space based on a concentration of trajectories of the phonetic sequences that correspond to different phoneme sequences which intersect the region, wherein each trajectory represents an acoustic characteristic of at least a part of a respective phonetic sequence that contains the particular phonetic segment; and
  
  forming acoustic elements from the phonetic sequences by identifying cut points in the phonetic sequences at respective time points along the corresponding trajectories based on the proximity of the time points to the tolerance region.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1 further comprising the step of selecting at least one phonetic sequence from the plurality of phonetic sequences which have portions corresponding to a particular phoneme sequence based on the proximity of the corresponding trajectories to the tolerance region, wherein an acoustic element is formed from the portion of the selected phonetic sequence.
  - 3. The method of claim 1 wherein the step of forming the acoustic elements identifies the cut points of each of the phonetic sequences at a respective time point along the corresponding trajectory that is approximately the closest to or within the tolerance region.
  - 4. The method of claim 3 wherein the step of forming the acoustic elements identifies the cut points of each of the phonetic sequences at a respective time point along the corresponding trajectory that is approximately the closest to a center point of the tolerance region.
  - 5. The method of claim 1 wherein an acoustic element is formed for each anticipated phoneme sequence for a particular language.
  - 6. The method of claim 1 wherein the trajectories are based on formants of the phonetic sequences.
  - 7. The method of claim 1 wherein the trajectories are based on a three-formant representations and the representational space is a three-formant space.
  - 8. The method of claim 1 wherein the representational space is an N-dimensional space that includes a plurality of contiguous N-dimensional cells and wherein the step of determining the tolerance region further comprises performing a grid search to determine a region of at least one cell that is intersected by the substantially largest number of trajectories corresponding to different phoneme sequences.
  - 9. The method of claim 1 wherein the representational space is an N-dimensional space that includes a plurality of contiguous N-dimensional cells and wherein the step of determining the tolerance region comprises:
    - identifying those cells that are within a resolution region surrounding time points along each trajectory;
      
      for each identified cell within the resolution region, updating a list maintained for that cell with an identification of the phoneme sequence that corresponds to the trajectory if such identification does not appear in the list for that cell; and
      
      determining the tolerance region corresponding to at least one cell having a greater than average number of identifications on its list.
  - 10. The method of claim 9 wherein the step of identifying those cells that are within a resolution region comprises processing the time points along the trajectories and updating lists associated with the cells within the corresponding resolution regions.
  - 11. The method of claim 9 wherein the resolution region and the tolerance region are of the same size.
  - 12. The method of claim 1 wherein the representational space is an N-dimensional space that includes a plurality of contiguous N-dimensional cells and wherein the step of determining the tolerance region comprises:
    - identifying those cells that are within a resolution region surrounding time points along each trajectory;
      
      for each identified cell within the resolution region, updating a list maintained for that cell with an identification of the phoneme sequence that corresponds to the trajectory;
      
      removing multiple identifications from each cell list; and
      
      determining the tolerance region corresponding to at least one cell having a greater than average number of identifications on its list.
  - 13. The method of claim 12 wherein the step of identifying those cells that are within a resolution region comprises processing the time points along the trajectories and updating lists associated with the cells within the corresponding resolution regions.
  - 14. The method of claim 12 wherein the resolution region and the tolerance region are the same size.
  - 15. The method of claim 1 wherein at least two phonetic sequences of the plurality of phonetic sequences have portions corresponding to a particular phoneme sequence, the method further comprising the step of:
    - determining a value for each section of the phonetic sequences based on the corresponding trajectories'"'"' proximity to the tolerance region, wherein the acoustic element for the particular phoneme sequence is formed from one of the corresponding portions of the phonetic sequences based on the determined values.
  - 16. The method of claim 15 wherein the step of determining the values is further based on a quality measure of the corresponding phonetic sequence.
  - 17. The method of claim 16 wherein the quality measure is determined from the proximity of a trajectory to a tolerance region for the phonetic sequence corresponding to a different boundary phoneme.

18. An apparatus for producing synthesized speech, the apparatus including an acoustic element database containing acoustic elements that are concatenated to produce synthesized speech, the acoustic element database established by the steps comprising:
- for at least one phoneme corresponding to particular phonetic segments contained in a plurality of phonetic sequences occurring in an interval of a speech signal,determining a relative positioning of a tolerance region within a representational space based on a concentration of trajectories of the phonetic sequences that correspond to different phoneme sequences which intersect the region, wherein each trajectory represents an acoustic characteristic of at least a part of a respective phonetic sequence that contains the particular phonetic segment; and
  
  forming acoustic elements from the phonetic sequences by identifying cut points in the phonetic sequences at respective time points along the corresponding trajectories based on the proximity of the time points to the tolerance region.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The apparatus of claim 18 wherein the representational space is an N-dimensional space that includes a plurality of contiguous N-dimensional cells and wherein the step of determining the tolerance region comprises:
    - identifying those cells that are within a resolution region surrounding time points along each trajectory;
      
      for each identified cell within the resolution region, updating a list maintained for that cell with an identification of the phoneme sequence that corresponds to the trajectory if such identification does not appear in the list for that cell; and
      
      determining the tolerance region corresponding to at least one cell having a greater than average number of identifications on its list.
  - 20. The apparatus of claim 19 wherein the step of identifying those cells that are within a resolution region comprises processing the time points along the trajectories and updating lists associated with the cells within the corresponding resolution regions.
  - 21. The apparatus of claim 18 wherein the representational space is an N-dimensional space that includes a plurality of contiguous N-dimensional cells and wherein the step of determining the tolerance region comprises:
    - identifying those cells that are within a resolution region surrounding time points along each trajectory;
      
      for each identified cell within the resolution region, updating a list maintained for that cell with an identification of the phoneme sequence that corresponds to the trajectory;
      
      removing multiple identifications from each cell list; and
      
      determining the tolerance region corresponding to at least one cell having a greater than average number of identifications on its list.
  - 22. The apparatus of claim 21 wherein the step of identifying those cells that are within a resolution region comprises processing the time points along the trajectories and updating lists associated with the cells within the corresponding resolution regions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alcatel-Lucent USA, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Olive, Joseph Philip, Moebius, Bernd, VanSanten, Jan Pieter, Tanenblatt, Michael Abraham
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
CHAWAN, VIJAY B

Application Number

US08/515,887
Time in Patent Office

1,000 Days
Field of Search

395/2.69, 395/2.75, 395/2.76, 395/2.77, 395/2.63, 381/43
US Class Current

704/267
CPC Class Codes

G10L 13/02 Methods for producing synth...

Speech synthesizer having an acoustic element database

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesizer having an acoustic element database

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links