System and method for converting text-to-voice

US 20020072908A1
Filed: 03/27/2001
Published: 06/13/2002
Est. Priority Date: 10/19/2000
Status: Active Grant

First Claim

Patent Images

1. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, the method including receiving text data, converting the test data into a sequence of speech items in accordance with the digital voice library, the method further comprising:

determining a syllable count for each speech item in the sequence of speech items;

determining an impact value for each speech item in the sequence of speech items;

determining a desired inflection for each speech item in the sequence of speech items based on the syllable count and the impact value for the particular speech item and further based on the set of playback rules;

determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item; and

generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. Multiple voice recordings correspond to a single speech item and represent various inflections of that single speech item. The method includes determining syllable count and impact value for each speech item in a sequence of speech items. A desired inflection for each speech item is determined based on the syllable count and the impact value and further based on a set of playback rules. A sequence of voice recordings is determined by determining a voice recording for each speech item based on the desired inflection and based on the available voice recordings that correspond to the particular speech item. Voice data are generated based on a sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.

Citations

20 Claims

1. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, the method including receiving text data, converting the test data into a sequence of speech items in accordance with the digital voice library, the method further comprising:
- determining a syllable count for each speech item in the sequence of speech items;
  
  determining an impact value for each speech item in the sequence of speech items;
  
  determining a desired inflection for each speech item in the sequence of speech items based on the syllable count and the impact value for the particular speech item and further based on the set of playback rules;
  
  determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item; and
  
  generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20)
- - 2. The method of claim 1 wherein a plurality of the speech items are glue items and a plurality of the speech items are payload items, the method further comprising:
    - setting a flag for any speech item in the sequence of speech items that is a glue item, wherein the playback rules dictate that the desired inflection for a glue item is based on the desired inflection for surrounding payload items in the sequence of speech items and that the desired inflection for a payload item is based on the desired inflection for nearest payload items in the sequence of speech items.
  - 3. The method of claim 2 wherein the plurality of speech items includes a plurality of phrases.
  - 4. The method of claim 3 wherein the plurality of speech items includes a plurality of words.
  - 5. The method of claim 4 wherein the plurality of speech items includes a plurality of syllables.
  - 6. The method of claim 1 wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item and wherein the various inflections belong to various inflection groups including a at least one standard inflection group, at least one emphatic inflection group, and at least one question inflection group.
  - 7. The method of claim 6 wherein the at least one question inflection group includes a single word question inflection group and a multiple word question inflection group.
  - 8. The method of claim 1 wherein the plurality of speech items includes a plurality of words, the method further comprising:
    - determining a pitch value for each speech item in the sequence of speech items by normalizing the impact value for the particular speech item, wherein the desired inflection for each speech item is further based on the pitch value for the particular speech item.
  - 9. The method of claim 8 wherein the pitch value for each speech item is between one and five.
  - 10. The method of claim 9 further comprising:
    - remodulating the pitch values for the sequence of speech items such that no more than two consecutive words have the same pitch value except when the particular consecutive words lead a sentence.
  - 11. The method of claim 9 further comprising:
    - remodulating the pitch values for the sequence of speech items such that there are at least two words between any two words having a pitch values of five.
  - 12. The method of claim 9 further comprising:
    - remodulating the pitch values for the sequence of speech items such that there is at least one word between any two words having pitch values of four.
  - 13. The method of claim 9 further comprising:
    - remodulating the pitch values for the sequence of speech items such that any word that is at the beginning of a sentence has a pitch value of at least three.
  - 14. The method of claim 9 further comprising:
    - remodulating the pitch values for the sequence of speech items such that any word that immediately precedes a comma or semi-colon has a pitch value of not more than three.
  - 15. The method of claim 9 further comprising:
    - remodulating the pitch values for the sequence of speech items such that any word that is at the end of a sentence ending in a period or exclamation point has a pitch value of one.
  - 17. The method of claim 16 wherein the plurality of speech items includes a plurality of phrases.
  - 18. The method of claim 17 wherein the plurality of speech items includes a plurality of words.
  - 19. The method of claim 18 wherein the plurality of speech items includes a plurality of syllables.
  - 20. The method of claim 19 wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item and wherein the various inflections belong to various inflection groups including a at least one standard inflection group, at least one emphatic inflection group, and at least one question inflection group.

16. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items, including glue items and payload items, and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, the method including receiving text data, converting the text data into a sequence of speech items in accordance with the digital voice library, the method further comprising:
- determining a syllable count for each speech item in the sequence of speech items;
  
  determining an impact value for each speech item in the sequence of speech items;
  
  determining a pitch value within a range for each speech item in the sequence of speech items by normalizing the impact value for the particular speech item;
  
  determining a desired inflection for each speech item in the sequence of speech items based on the syllable count and the pitch value for the particular speech item and further based on the set of playback rules wherein the playback rules dictate that the desired inflection for a glue item is based on the desired inflection for surrounding payload items and that the desired inflection for a payload item is based on the desired inflection for nearest payload items with priority being given to speech items having a greater pitch value such that the desired inflections are determined first for speech items having the greatest pitch value and, thereafter, are determined for speech items in order of descending pitch;
  
  determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item; and
  
  generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qwest Communications International Incorporated (Lumen Technologies, Inc.)
Original Assignee
Qwest Communications International Incorporated (Lumen Technologies, Inc.)
Inventors
Case, Eliot M., Phillips, Richard P., Weirauch, Judith L.

Granted Patent

US 6,990,450 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/07 Concatenation rules

System and method for converting text-to-voice

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for converting text-to-voice

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links