Phonetic unit duration adjustment for text-to-speech system

US 6,330,538 B1
Filed: 12/11/1997
Issued: 12/11/2001
Est. Priority Date: 06/13/1995
Status: Expired due to Term

First Claim

Patent Images

1. A speech synthesis method comprising:

supplying a sequence or representations of phonetic units;

retrieving stored portions of data to generate waveforms corresponding to the phonetic units;

determining durations for the phonetic units; and

processing the portions of data to adjust the time durations of the waveforms according to the determined durations;

wherein the determining step is operable to define a constant duration for said phonetic unit, said constant duration corresponding to a regular beat period and selectively in dependence on the intrinsic duration of the phonetic unit and/or its context within the sequence, to carry out a constant duration regulation calculation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Input text is converted to a sequence of representations of syllables or other phonetic units and stored portions of data are retrieved to generate waveforms corresponding to the syllables. In order to determine durations for the syllables, a constant duration is defined corresponding to a regular beat period and adjusted in accordance with the nature of the syllable and/or its context within the sequence.

Citations

26 Claims

1. A speech synthesis method comprising:
- supplying a sequence or representations of phonetic units;
  
  retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  determining durations for the phonetic units; and
  
  processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining step is operable to define a constant duration for said phonetic unit, said constant duration corresponding to a regular beat period and selectively in dependence on the intrinsic duration of the phonetic unit and/or its context within the sequence, to carry out a constant duration regulation calculation.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A speech synthesis method as in claim 1 further comprising:
3. A speech synthesis method as in claim 1 in which the phonetic units are syllables.
4. A speech synthesis method as in claim 1 including:
- storing items of data representing waveforms corresponding to phonetic sub-units, the retrieving step retrieving for each phonetic unit, one or more portions of data each corresponding to a sub-unit thereof, and further storing for each sub-unit statistical duration data including a maximum value and a minimum value;
  
  wherein the determining step computes for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and adjusts the said constant duration such that it neither falls below the sum of the minimum values nor exceeds the sum of the maximum values.
5. A speech synthesis method as in claim 4 in which the sub-units are phonemes.

6. A speech synthesis method comprising:
- supplying a sequence of representations of phonetic units;
  
  retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  determining durations for the phonetic units;
  
  processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining step is operable to define a constant duration corresponding to a regular beat period and to adjust that duration in dependence on the intrinsic duration of the phonetic unit and/or its context within the sequence, storing items of data representing waveforms corresponding to phonetic sub-units, the retrieving step retrieving for each phonetic unit, one or more portions of data each corresponding to a sub-unit thereof, and further storing for each sub-unit statistical duration data including a maximum value and a minimum value;
  
  wherein the determining step computes for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and adjusts the said constant duration such that it neither falls below the sum of the minimum values nor exceeds the sum of the maximum values;
  
  wherein said determining step adjusts the said constant duration value such that it does not fall below a modified minimum value which exceeds the sum of the minimum values to an extent determined by the context of the phonetic unit.

7. A speech synthesis method comprising:
- supplying a sequence of representations of phonetic units;
  
  retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  determining durations for the phonetic units;
  
  processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining step is operable to define a constant duration corresponding to a regular beat period and to adjust that duration in dependence on the intrinsic duration of the phonetic unit and/or its context within the sequence, storing items of data representing waveforms corresponding to phonetic sub-units, the retrieving step retrieving for each phonetic unit, one or more portions of data each corresponding to a sub-unit thereof, and further storing for each sub-unit statistical duration data including a maximum value and a minimum value;
  
  wherein the determining step computes for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and adjusts the said constant duration such that it neither falls below the sum of the minimum values nor exceeds the sum of the maximum values;
  
  wherein the statistical duration data include for each sub-unit a central value, and each sub-unit of a phonetic unit is assigned a duration which is a fraction of the adjusted constant value for that phonetic unit in proportion to the ratio of the central value for that sub-unit to the sum of the central values for the constituent sub-units of that phonetic unit.

8. A speech synthesis method comprising:
- supplying a sequence of representations of phonetic units;
  
  retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  determining durations for the phonetic units; and
  
  processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining step is operable to;
  
  a) determine bounds for said duration, said bounds depending on the intrinsic duration of the phonetic unit and/or its context within the sequence; and
  
  b) assign a constant duration corresponding to a regular beat period to said phonetic unit provided said constant duration does not transgress said bounds.
- View Dependent Claims (9)
- - 9. A speech synthesis method as in claim 8 in which the phonetic units are syllables.

10. A speech synthesis method comprising:
- supplying a sequence of representations of phonetic units;
  
  retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  determining durations for the phonetic units; and
  
  processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining step is operable to;
  
  a) determine bounds for said duration, said bounds depending on the intrinsic duration of the phonetic unit and/or its context within the sequence; and
  
  b) assign a constant duration corresponding to a regular beat period to said phonetic unit provided said constant duration does not transgress said bounds, the retrieving step retrieving for each phonetic unit, one or more portions of data each corresponding to a sub-unit thereof, and the determining step computing for each phonetic unit the sum of minimum duration values and the sum of maximum duration values for the constituent sub-unit(s) thereof and correcting the said constant duration if the computed constant duration falls below the sum of the minimum values or exceeds the sum of the maximum values.
- View Dependent Claims (11, 12, 13)
- - 11. A speech synthesis method as in claim 10 in which the sub-units are phonemes.
  - 12. A speech synthesis method as in claim 10 in which the determining step is operable to adjust the said constant duration value such that it does not fall below a modified minimum value which exceeds the sum of the minimum values to an extent determined by the context of the phonetic unit.
  - 13. A speech synthesis method as in claim 10 in which:

14. A speech synthesiser comprising:
- means for supplying a sequence of representations of phonetic units;
  
  means for retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  means for determining durations for the phonetic units; and
  
  means for processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining means is operable to define a constant duration for said phonetic unit, said constant duration corresponding to a regular beat period and selectively in dependence on the intrinsic duration of the phonetic unit and/or its context within the sequence, to carry out a constant duration regulation calculation.
- View Dependent Claims (15, 16, 17, 18)
- - 15. A speech synthesiser as in claim 14 further comprising:
16. A speech synthesiser as in claim 14 in which the phonetic units are syllables.
17. A speech synthesis as in claim 14 including:
- a store containing items of data representing waveforms corresponding to phonetic sub-units, the retrieving means being operable to retrieve, for each phonetic unit one or more portions of data each corresponding to a sub-unit thereof, and a further store containing for each sub-unit statistical duration data including a maximum value and a minimum value, wherein the determining means is operable to compute for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and to adjust the said constant duration such that it neither falls below the sum of the minimum values nor exceeds the sum of the maximum values.
18. A speech synthesiser as in claim 17 in which the sub-units are phonemes.

19. A speech synthesiser comprising:
- means for supplying a sequence of representations of phonetic units;
  
  means for retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  means for determining durations for the phonetic units;
  
  means for processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining means is operable to define a constant duration corresponding to a regular beat period and to adjust that duration in dependence on the nature of the phonetic unit and/or its context within the sequence;
  
  a store containing items of data representing waveforms corresponding to phonetic sub-units, the retrieving means being operable to retrieve, for each phonetic unit, one or more portions of data each corresponding to a sub-unit thereof, and a further store containing for each sub-unit statistical duration data including a maximum value and a minimum value, wherein the determining means is operable to compute for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and to adjust the said constant duration such that it neither falls below the sum of the minimum values nor exceeds the sum of the maximum values; and
  
  wherein the determining means is operable to adjust the said constant duration value such that it does not fall below a modified minimum value which exceeds the sum of the minimum values to an extent determined by the context of the phonetic unit.

20. A speech synthesiser comprising:
- means for supplying a sequence of representations of phonetic units;
  
  means for retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  means for determining durations for the phonetic units;
  
  means for processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining, means is operable to define a constant duration corresponding to a regular beat period and to adjust that duration in dependence on the nature of the phonetic unit and/or its context within the sequence;
  
  a store containing items of data representing waveforms corresponding to phonetic sub-units, the retrieving means being operable to retrieve, for each phonetic unit, one or more portions of data each corresponding to sub-unit thereof, and a further store containing for each sub-unit statistical duration data including a maximum value and a minimum value, wherein the determining means is operable to compute for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and to adjust the said constant duration such that it neither falls below the sum of the minimum values nor exceeds the sum of the maximum values; and
  
  wherein the statistical duration data include for each sub-unit a central value, and means to assign to each sub-unit of a phonetic unit a duration which is a fraction of the adjusted constant value for that phonetic unit in proportion to the ratio of the central value for that sub-unit to the sum of the central values for the constituent sub-units of that phonetic unit.

21. A speech synthesizer comprising:
- means for supplying a sequence of representations of phonetic units;
  
  means for retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  means for determining durations for the phonetic units; and
  
  means for processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining means is operable to;
  
  a) determine bounds for said duration, said bounds depending on the intrinsic duration of the phonetic unit and/or its context within the sequence; and
  
  b) assign a constant duration corresponding to a regular beat period to said phonetic unit provided said constant duration does not transgress said bounds.
- View Dependent Claims (22)
- - 22. A speech synthesizer as in claim 21 which the phonetic units are syllables.

23. A speech synthesizer comprising:
- means for supplying a sequence of representations of phonetic units;
  
  means for retrieving stored portions of data to generate waveforms corresponding to the phonetic units;
  
  means for determining durations for the phonetic units; and
  
  means for processing the portions of data to adjust the time durations of the waveforms according to the determined durations;
  
  wherein the determining means is operable to;
  
  a) determine bounds for said duration, said bounds depending on the intrinsic duration of the phonetic unit and/or its context within the sequence; and
  
  b) assign a constant duration corresponding to a regular beat period to said phonetic unit provided said constant duration does not transgress said bounds, a store containing items of data representing waveforms corresponding to phonetic sub-units, the retrieving means being operable to retrieve, for each phonetic unit, one or more portions of data each corresponding to a sub-unit thereof, and a further store containing for each sub-unit statistical duration data including a maximum value and a minimum value, wherein the determining means is operable to compute for each phonetic unit the sum of the minimum duration values and the sum of the maximum duration values for the constituent sub-unit(s) thereof and to correct the said constant duration if the computed constant duration falls below the sum of minimum values or exceeds the sum of the maximum values.
- View Dependent Claims (24, 25, 26)
- - 24. A speech synthesizer as in claim 23 in which the sub-units are phonemes.
  - 25. A speech synthesizer as in claim 23 in which the determining means is operable to adjust the said constant duration value such that it does not fall below a modified minimum value which exceeds the sum of the minimum values to an extent determined by the context of the phonetic unit.
  - 26. A speech synthesizer as in claim 23 in which:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Breen, Andrew P
Primary Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/973,737
Time in Patent Office

1,461 Days
Field of Search

704/260, 704/267
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

Phonetic unit duration adjustment for text-to-speech system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Phonetic unit duration adjustment for text-to-speech system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links