Post processing timing of rhythm in synthetic speech
First Claim
1. A synthetic speech system comprising:
- means for detecting natural timing boundaries in words to be spoken by said synthetic speech system, to produce natural timing intervals;
means for identifying phonemes in said natural timing intervals;
means for assigning first time durations for each of said phonemes;
means for changing a selected first time duration of a selected phoneme to achieve a desired time duration for a selected natural timing interval containing said selected phoneme; and
means for setting a plurality of said natural timing intervals to substantially the same second time duration, a particular phoneme having a computed time duration in response to number of phonemes within said selected natural timing interval and said second time duration;
wherein at least said selected first time duration is based upon an elasticity parameter indicative of degree to which said selected first time duration may be adjusted without undesirably degrading speech produced by said system.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for generating synthetic speech uses detection of natural timing boundaries in words to be spoken by the synthetic speech system, to produce natural timing intervals. Phonemes are identified in the natural timing intervals. Time durations are assigned for each of the phonemes. A time duration of a selected phoneme is changed to achieve a desired time duration for a selected natural timing interval containing the phoneme. The natural timing interval may be selected to be a syllable. The natural timing interval may be selected to be the interval between two stressed phonemes. The natural timing intervals may be set to substantially the same duration between timing boundaries by changing the phoneme durations in accordance with rhythm of the language of synthesized speech. Durations of preselected phonemes, however, may remain unchanged.
-
Citations
17 Claims
-
1. A synthetic speech system comprising:
-
means for detecting natural timing boundaries in words to be spoken by said synthetic speech system, to produce natural timing intervals; means for identifying phonemes in said natural timing intervals; means for assigning first time durations for each of said phonemes; means for changing a selected first time duration of a selected phoneme to achieve a desired time duration for a selected natural timing interval containing said selected phoneme; and means for setting a plurality of said natural timing intervals to substantially the same second time duration, a particular phoneme having a computed time duration in response to number of phonemes within said selected natural timing interval and said second time duration; wherein at least said selected first time duration is based upon an elasticity parameter indicative of degree to which said selected first time duration may be adjusted without undesirably degrading speech produced by said system. - View Dependent Claims (2, 3)
-
-
4. A method for generating synthetic speech, comprising;
-
detecting natural timing boundaries in words to be spoken by a synthetic speech system, to produce natural timing intervals; identifying phonemes in said natural timing intervals; assigning first time durations for each of said phonemes; changing a selected first time duration of a selected phoneme to achieve a desired time duration for a selected natural timing interval containing said selected phoneme; and setting a plurality of said natural timing intervals to substantially the same second time duration, a particular phoneme having a computed time duration in response to number of phonemes within said selected natural timing interval and said second time durations; wherein at least said selected first time duration is based upon a predetermined parameter indicative of degree to which said selected first time duration may be adjusted without undesirably degrading speech produced by said system. - View Dependent Claims (5, 6)
-
-
7. A synthetic speech system comprising:
-
means for storing speech to be synthesized in a computer memory; means for a processor to read said speech from said computer memory and for said processor to detect natural timing boundaries in words to be spoken by said synthetic speech system, to produce natural timing intervals; means for identifying phonemes in said natural timing intervals; means for assigning first time durations for each of said phonemes; means for changing a selected first time duration of a selected phoneme to achieve a desired time duration for a selected natural timing interval containing said selected phoneme; means for setting a plurality of said natural timing intervals to substantially the same second time duration, a particular phoneme having a computed time duration in response to number of phonemes within said selected natural timing interval and said second time duration; and means for applying said synthesized speech to an electromechanical acoustic coupler to make audible speech; wherein respective time durations of at least certain respective phonemes are based upon respective selectable parameters indicative of respective degrees to which said respective time durations may be adjusted without undesirably degrading speech produced by said system. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A synthetic speech system comprising:
-
a computer process for detecting natural timing boundaries in words to be spoken by said synthetic speech system, to produce natural timing intervals, said words stored in a computer memory; a computer process for identifying phonemes in said natural timing intervals; a computer process for assigning first time durations for each of said phonemes; a computer process for changing a selected first time duration of a selected phoneme to achieve a desired time duration for a selected natural timing interval containing said selected phoneme; and a computer process for setting a plurality of said natural timing intervals to substantially the same second time duration, a particular phoneme having a computed time duration in response to number of phonemes within said selected natural timing interval and said second time duration; wherein at least one respective time duration of at least one respective phoneme is based upon a selectable parameter indicative of degree to which the at least one respective time duration is adjustable without undesirably degrading speech produced by the system. - View Dependent Claims (13, 14, 15)
-
-
16. A synthetic speech system comprising:
-
a computer process for detecting natural timing boundaries in words including syllables to be spoken by said synthetic speech system, to produce natural timing intervals, said words being stored in a computer memory and each of said natural timing intervals involving a respective syllable; a computer process for identifying phonemes in each syllable, said phonemes including flexible and inflexible phonemes; a computer process for assigning first time durations for each of said phonemes; and a computer process for achieving a selected percentage of syllable timed rhythm in synthesized speech by adjusting a respective inherent time interval for each respective flexible phoneme to adjust a respective syllable time duration to be within said selected percentage of a reference duration, said reference duration being computed from a desired speaking rate, and respective time durations of said inflexible phonemes not being adjusted based upon said selected percentage.
-
-
17. A synthetic speech system comprising;
-
a computer process for detecting natural timing boundaries in words including syllables to be spoken by said synthetic speech system, to produce natural timing intervals, said words being stored in a computer memory, and each natural timing interval involving a respective stressed time interval between respective stressed syllables; a computer process for identifying phonemes in said stressed time interval, said phonemes including fixed and flexible phonemes; a computer process for assigning first time durations for each of said phonemes; and a computer process for achieving a selected percentage of stress timed rhythm in synthesized speech by adjusting an inherent time interval for at least one flexible phoneme in a respective stressed time interval to adjust a respective time duration of said respective stressed time interval to be within said selected percentage of a reference duration, said reference duration being computed from a desired speaking rate, and respective time durations of said fixed phonemes not being adjusted based upon said selected percentage.
-
Specification