SPEECH SYNTHESIZER, SPEECH SYNTHESIZING METHOD AND PROGRAM PRODUCT
First Claim
1. A speech synthesizer comprising:
- an analyzer that performs a text analysis of an input document and extract a linguistic feature used for prosody control;
a first estimator that selects a first prosody model adapted to the extracted linguistic feature from predetermined first prosody models that are models of speech prosody information and that estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model;
a selector that selects, from a speech unit storage storing speech units, a plurality of speech units that minimizes a cost function determined in accordance with the prosody information estimated by the first estimator;
a generator that generates a second prosody model that is a model of prosody information of the selected speech units;
a second estimator that estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model; and
a synthesizer that generates synthetic speech by concatenating the selected speech units on the basis of the prosody information estimated by the second estimator.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
-
Citations
6 Claims
-
1. A speech synthesizer comprising:
-
an analyzer that performs a text analysis of an input document and extract a linguistic feature used for prosody control; a first estimator that selects a first prosody model adapted to the extracted linguistic feature from predetermined first prosody models that are models of speech prosody information and that estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model; a selector that selects, from a speech unit storage storing speech units, a plurality of speech units that minimizes a cost function determined in accordance with the prosody information estimated by the first estimator; a generator that generates a second prosody model that is a model of prosody information of the selected speech units; a second estimator that estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model; and a synthesizer that generates synthetic speech by concatenating the selected speech units on the basis of the prosody information estimated by the second estimator. - View Dependent Claims (2, 3, 4)
-
-
5. A speech synthesis method comprising:
-
performing a text analysis of an input document and extracting a linguistic feature used for prosody control; selecting a first prosody model adapted to the extracted linguistic feature from predetermined first prosody models that are models of speech prosody information, and first estimating in which prosody information that maximizes a first likelihood representing probability of the selected first prosody model is estimated; selecting, from a speech unit storage storing speech units, a plurality of speech units that minimizes a cost function determined in accordance with the prosody information estimated in the first estimating; generating a second prosody model that is a model of prosody information of the selected speech units; second estimating in which prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model is estimated; and generating synthetic speech by concatenating the selected speech units on the basis of the prosody information estimated in the second estimating.
-
-
6. A program product having a computer readable medium including programmed instructions, wherein the instructions, when executed by a computer, causes the computer to perform:
-
performing an text analysis of an input document and extracting a linguistic feature used for prosody control; selecting a first prosody model adapted to the extracted linguistic feature from predetermined first prosody models that are models of speech prosody information, and first estimating in which prosody information that maximizes a first likelihood representing probability of the selected first prosody model is estimated; selecting, from a speech unit storage storing speech units, a plurality of speech units that minimizes a cost function determined in accordance with the prosody information estimated in the first estimating; generating a second prosody model that is a model of prosody information of the selected speech units; second estimating in which prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model is estimated; and generating synthetic speech by concatenating the selected speech units on the basis of the prosody information estimated in the second estimating.
-
Specification