Speech synthesis system
First Claim
1. A method for generating synthesized speech from input text, the method comprising the steps of:
- decomposing the input text into a sequence of speech units;
estimating a duration value for each speech unit in the sequence of speech units;
synthesizing speech based on said sequence of speech units and duration values;
characterized in that said estimating step utilizes a Hidden Markov Model (HMM) to determine the most likely sequence of duration values given said sequence of speech units, wherein each state of the HMM represents a duration value and each output from the HMM is a speech unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesis unit comprises a text processor which breaks down text into phonemes, a prosodic processor which assigns properties such as length and pitch to the phonemes based on context, and a synthesis unit which outputs an audio signal representing the sequence of phonemes according to the specified properties. The prosodic processor includes a Hidden Markov Model (HMM) to predict the durations of the phonemes. Each state of the HMM represents a duration, and the outputs are phonemes. The HMM is trained on a set of data consisting of phonemes of known identity and duration, to allow the state transition and output distributions to be calculated. The HMM can then be used for any given input sequence of phonemes to predict a most likely sequence of corresponding durations.
68 Citations
10 Claims
-
1. A method for generating synthesized speech from input text, the method comprising the steps of:
-
decomposing the input text into a sequence of speech units; estimating a duration value for each speech unit in the sequence of speech units; synthesizing speech based on said sequence of speech units and duration values; characterized in that said estimating step utilizes a Hidden Markov Model (HMM) to determine the most likely sequence of duration values given said sequence of speech units, wherein each state of the HMM represents a duration value and each output from the HMM is a speech unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A speech synthesis system for generating synthesized speech from input text comprising:
-
a text processor for decomposing the input text into a sequence of speech units; a prosodic processor for estimating a duration value for each speech unit in the sequence of speech units; a synthesis unit for synthesizing speech based on said sequence of speech units and duration values; and characterized in that said prosodic processor utilizes a Hidden Markov Model (HMM) to determine the most likely sequence of duration values given said sequence of speech units, wherein each state of the HMM represents a duration value and each output from the HMM is a speech unit.
-
Specification