Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis
First Claim
1. A method to generate an audible speech word that corresponds to text, comprising:
- providing a text word; and
in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word.
9 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
194 Citations
19 Claims
-
1. A method to generate an audible speech word that corresponds to text, comprising:
-
providing a text word; and
in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A concatenative text-to-speech system comprising:
- a data processor coupled to a memory that stores a database of speech segments derived from a plurality of speakers, said data processor being responsive to an input text word to selectively concatenate together speech segments from said database based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word.
- View Dependent Claims (8, 9, 10, 11, 12)
- 13. A data structure embodied in a computer readable medium for use in a concatenative text-to-speech system, comprising a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
-
18. A program storage device readable by machine and tangibly embodying a program of instructions executable by the machine to operate a concatenative text-to-speech apparatus, comprising operations of:
- responsive to a text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function; and
forming audio data for generating an audible speech word that corresponds to the text word, where each pre-recorded speech segment comprises an attribute vector, and each attribute vector comprises a vector element that identifies the speaker from which the speech segment was derived. - View Dependent Claims (19)
- responsive to a text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function; and
Specification