System and method for converting text-to-voice
First Claim
1. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, the method including receiving text data, converting the test data into a sequence of speech items in accordance with the digital voice library, the method further comprising:
- determining a syllable count for each speech item in the sequence of speech items;
determining an impact value for each speech item in the sequence of speech items;
determining a desired inflection for each speech item in the sequence of speech items based on the syllable count and the impact value for the particular speech item and further based on the set of playback rules;
determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item; and
generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. Multiple voice recordings correspond to a single speech item and represent various inflections of that single speech item. The method includes determining syllable count and impact value for each speech item in a sequence of speech items. A desired inflection for each speech item is determined based on the syllable count and the impact value and further based on a set of playback rules. A sequence of voice recordings is determined by determining a voice recording for each speech item based on the desired inflection and based on the available voice recordings that correspond to the particular speech item. Voice data are generated based on a sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
-
Citations
20 Claims
-
1. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, the method including receiving text data, converting the test data into a sequence of speech items in accordance with the digital voice library, the method further comprising:
-
determining a syllable count for each speech item in the sequence of speech items;
determining an impact value for each speech item in the sequence of speech items;
determining a desired inflection for each speech item in the sequence of speech items based on the syllable count and the impact value for the particular speech item and further based on the set of playback rules;
determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item; and
generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20)
-
-
16. A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules, the digital voice library including a plurality of speech items, including glue items and payload items, and a corresponding plurality of voice recordings wherein each speech item corresponds to at least one available voice recording wherein multiple voice recordings that correspond to a single speech item represent various inflections of that single speech item, the method including receiving text data, converting the text data into a sequence of speech items in accordance with the digital voice library, the method further comprising:
-
determining a syllable count for each speech item in the sequence of speech items;
determining an impact value for each speech item in the sequence of speech items;
determining a pitch value within a range for each speech item in the sequence of speech items by normalizing the impact value for the particular speech item;
determining a desired inflection for each speech item in the sequence of speech items based on the syllable count and the pitch value for the particular speech item and further based on the set of playback rules wherein the playback rules dictate that the desired inflection for a glue item is based on the desired inflection for surrounding payload items and that the desired inflection for a payload item is based on the desired inflection for nearest payload items with priority being given to speech items having a greater pitch value such that the desired inflections are determined first for speech items having the greatest pitch value and, thereafter, are determined for speech items in order of descending pitch;
determining a sequence of voice recordings by determining a voice recording for each speech item based on the desired inflection for the particular speech item and based on the available voice recordings that correspond to the particular speech item; and
generating voice data based on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
-
Specification