Method and system for synthesizing speech
First Claim
1. A method for synthesizing speech from text, comprising the steps of:
- generating a sequence of sub-phoneme elements from text, each sub-phoneme element representing a corresponding acoustic waveform; and
concatenating said sub-phoneme elements to produce an output waveform, wherein said generating step comprises the steps of;
generating from said text corresponding speech elements; and
mapping each speech element to one or more of a plurality of sub-phoneme elements to produce said sequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for synthesizing acoustic waveforms in, for example, a text-to-speech system is disclosed which employs the concatenation of a very large number of very small, sub-phoneme, acoustic units. Such sub-phoneme sized audio segments, called wavelets, can be individually spectrally analyzed and labelled as fenones. Fenones are clustered into logically related groups called fenemes. Sequences of fenemes can be matched with individual phonemes, and hence words. In the case of a text-to-speech system, the required phonemes are determined from prior linguistic analysis of the input words in the text. Suitable sequences of fenemes are predicted for each phoneme in its own context using hidden markov modelling techniques. A complete output waveform is constructed by concatenating wavelets to produce a very long sequence thereof, each wavelet corresponding to its respective feneme. The advantages of using a feneme set extracted from a training script read by a single human speaker is that it is possible to generate natural sounding speech, using a finite sized codebook.
119 Citations
25 Claims
-
1. A method for synthesizing speech from text, comprising the steps of:
-
generating a sequence of sub-phoneme elements from text, each sub-phoneme element representing a corresponding acoustic waveform; and concatenating said sub-phoneme elements to produce an output waveform, wherein said generating step comprises the steps of; generating from said text corresponding speech elements; and mapping each speech element to one or more of a plurality of sub-phoneme elements to produce said sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for synthesizing speech from text, the system comprising:
-
means for generating a sequence of sub-phoneme elements from text, each sub-phoneme element representing a corresponding acoustic waveform; and means for concatenating said sub-phoneme elements to produce an output waveform, wherein said means for generating comprises; means for generating from said text corresponding speech elements; and means for mapping each speech element to one or more of a plurality of sub-phoneme elements to produce said sequence. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method for synthesizing speech from text, comprising the steps of:
-
converting the text into a sequence of phonemes representative of the text; generating a sequence of fenemes representative of the sequence of phonemes; transforming the sequence of fenemes into a sequence of wavelets; and concatenating the sequence of wavelets to produce an acoustic waveform representative of the text.
-
Specification