Speech synthesizer having an acoustic element database
First Claim
1. A method for producing synthesized speech, the method including an acoustic element database containing acoustic elements that are concatenated to produce synthesized speech, the acoustic element database established by the steps comprising:
- for at least one phoneme corresponding to particular phonetic segments contained in a plurality of phonetic sequences occurring in an interval of a speech signal,determining a relative positioning of a tolerance region within a representational space based on a concentration of trajectories of the phonetic sequences that correspond to different phoneme sequences which intersect the region, wherein each trajectory represents an acoustic characteristic of at least a part of a respective phonetic sequence that contains the particular phonetic segment; and
forming acoustic elements from the phonetic sequences by identifying cut points in the phonetic sequences at respective time points along the corresponding trajectories based on the proximity of the time points to the tolerance region.
7 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis method employs an acoustic element database that is established from phonetic sequences occurring in an interval of a speech signal. In establishing the database, trajectories are determined for each of the phonetic sequences containing a phonetic segment that corresponds to a particular phoneme. A tolerance region is then identified based on a concentration of trajectories that correspond to different phoneme sequences. The acoustic elements for the database are formed from portions of the phonetic sequences by identifying cut points in the phonetic sequences which correspond to time points along the respective trajectories proximate the tolerance region. In this manner, it is possible to concatenate the acoustic elements having a common junction phonemes such that perceptible discontinuities at the junction phonemes are minimized. Computationally simple and fast methods for determining the tolerance region are also disclosed.
-
Citations
22 Claims
-
1. A method for producing synthesized speech, the method including an acoustic element database containing acoustic elements that are concatenated to produce synthesized speech, the acoustic element database established by the steps comprising:
for at least one phoneme corresponding to particular phonetic segments contained in a plurality of phonetic sequences occurring in an interval of a speech signal, determining a relative positioning of a tolerance region within a representational space based on a concentration of trajectories of the phonetic sequences that correspond to different phoneme sequences which intersect the region, wherein each trajectory represents an acoustic characteristic of at least a part of a respective phonetic sequence that contains the particular phonetic segment; and forming acoustic elements from the phonetic sequences by identifying cut points in the phonetic sequences at respective time points along the corresponding trajectories based on the proximity of the time points to the tolerance region. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
18. An apparatus for producing synthesized speech, the apparatus including an acoustic element database containing acoustic elements that are concatenated to produce synthesized speech, the acoustic element database established by the steps comprising:
for at least one phoneme corresponding to particular phonetic segments contained in a plurality of phonetic sequences occurring in an interval of a speech signal, determining a relative positioning of a tolerance region within a representational space based on a concentration of trajectories of the phonetic sequences that correspond to different phoneme sequences which intersect the region, wherein each trajectory represents an acoustic characteristic of at least a part of a respective phonetic sequence that contains the particular phonetic segment; and forming acoustic elements from the phonetic sequences by identifying cut points in the phonetic sequences at respective time points along the corresponding trajectories based on the proximity of the time points to the tolerance region. - View Dependent Claims (19, 20, 21, 22)
Specification