SPEECH MODEL GENERATING APPARATUS, SPEECH SYNTHESIS APPARATUS, SPEECH MODEL GENERATING PROGRAM PRODUCT, SPEECH SYNTHESIS PROGRAM PRODUCT, SPEECH MODEL GENERATING METHOD, AND SPEECH SYNTHESIS METHOD
First Claim
1. A speech model generating apparatus comprising:
- a text analyzer that acquires text information and performs a text analysis of the text information to generate linguistic context of the text information;
a spectrum analyzer that acquires a speech signal corresponding to the text information and calculates a set of spectral coefficients that describe a spectrum shape of each frame of the speech signal;
a chunker that acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into the linguistic units on the basis of the boundary information, each linguistic unit expanding over multiple frames of the speech signal;
a parameterizer that calculates a set of spectral trajectory parameters for a trajectory of the spectral coefficients associated with the linguistic unit;
a clustering unit that clusters a plurality of spectral trajectory parameters calculated for each of the linguistic units into a plurality of clusters on the basis of the linguistic context; and
a model training unit that obtains a trained spectral trajectory model indicating for each cluster a statistical distribution of the spectral trajectory parameters belonging to that cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment, a speech model generating apparatus includes a spectrum analyzer, a chunker, a parameterizer, a clustering unit, and a model training unit. The spectrum analyzer acquires a speech signal corresponding to text information and calculates a set of spectral coefficients. The chunker acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into linguistic units. The parameterizer calculates a set of spectral trajectory parameters for a trajectory of the spectral trajectory parameters of the linguistic unit on the basis of the spectral coefficients. The clustering unit clusters the spectral trajectory parameters calculated for each of the linguistic units into clusters on the basis of linguistic information. The model training unit obtains a trained spectral trajectory model indicating a characteristic of a cluster based on the spectral trajectory parameters belonging to the same cluster.
29 Citations
10 Claims
-
1. A speech model generating apparatus comprising:
-
a text analyzer that acquires text information and performs a text analysis of the text information to generate linguistic context of the text information; a spectrum analyzer that acquires a speech signal corresponding to the text information and calculates a set of spectral coefficients that describe a spectrum shape of each frame of the speech signal; a chunker that acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into the linguistic units on the basis of the boundary information, each linguistic unit expanding over multiple frames of the speech signal; a parameterizer that calculates a set of spectral trajectory parameters for a trajectory of the spectral coefficients associated with the linguistic unit; a clustering unit that clusters a plurality of spectral trajectory parameters calculated for each of the linguistic units into a plurality of clusters on the basis of the linguistic context; and a model training unit that obtains a trained spectral trajectory model indicating for each cluster a statistical distribution of the spectral trajectory parameters belonging to that cluster. - View Dependent Claims (2, 3, 4)
-
-
5. A speech synthesis apparatus comprising:
-
a text analyzer that acquires text information, which is a speech synthesis target, and performs a text analysis of the text information to generate linguistic context indicating content of language in the text information; a model selector that, on the basis of the linguistic context of a linguistic unit in the text information, selects a spectral trajectory model of a cluster to which the linguistic unit belongs, from a storage unit storing spectral trajectory models clustered into a plurality of clusters on the basis of the linguistic context of a plurality of the linguistic units, the spectral trajectory model indicating a statistical distribution of a plurality of spectral trajectory parameters of a plurality of speech signals on the text information, and each linguistic unit having a plurality of frames; and a generator that generates the spectral trajectory parameters of the linguistic unit on the basis of the spectral trajectory model selected by the model selector and obtains spectral coefficients by an inverse transformation of the spectral trajectory parameters. - View Dependent Claims (6)
-
-
7. A speech model generating program product having a computer readable medium including programmed instructions, wherein the instructions, when executed by a computer, causes the computer to perform:
-
acquiring text information and performing a text analysis of the text information to generate linguistic context indicating content of language in the text information; acquiring a speech signal corresponding to the text information and calculating a set of spectral coefficients that describe the spectrum shape of each frame of the speech signal; acquiring boundary information that indicates a beginning and an end of linguistic units and chunking the speech signal into the linguistic units on the basis of the boundary information, each linguistic unit expanding over multiple frames of the speech signal; calculating a set of spectral trajectory parameters for a trajectory of the spectral coefficients associated with the linguistic unit; clustering a plurality of the spectral trajectory parameters calculated for each of the linguistic units into a plurality of clusters on the basis of the linguistic context; and obtaining a trained spectral trajectory model that indicates for each cluster a statistical distribution of the spectral trajectory parameters belonging to that cluster.
-
-
8. A speech synthesis program product having a computer readable medium including programmed instructions, wherein the instructions, when executed by a computer, causes the computer to perform:
-
acquiring text information, which is a speech synthesis target, and performing a text analysis of the text information to generate linguistic context that indicates content of language in the text information; selecting, on the basis of the linguistic context of a linguistic unit in the text information, a spectral trajectory model of a cluster to which the linguistic unit belongs, from a storage unit that stores spectral trajectory models clustered into a plurality of clusters on the basis of the linguistic context of a plurality of linguistic units, the spectral trajectory model indicating a statistical distribution of a plurality of spectral trajectory parameters of a plurality of speech signals on the text information, and each linguistic unit having a plurality of frames; and generating the spectral trajectory parameters of the linguistic unit on the basis of the selected spectral trajectory model and obtaining spectral coefficients by an inverse transformation of the spectral trajectory parameters.
-
-
9. A speech model generating method comprising:
-
acquiring text information and performing a text analysis of the text information to generate linguistic context indicating content of language in the text information; acquiring a speech signal corresponding to the text information and calculating a set of spectral coefficients that describe a spectrum shape of each frame of the speech signal; acquiring boundary information that indicates a beginning and an end of linguistic units and chunking the speech signal into the linguistic units on the basis of the boundary information, each linguistic unit expanding over multiple frames of the speech signal; calculating a set of spectral trajectory parameters for a trajectory of the spectral coefficients associated with the linguistic unit; clustering a plurality of the spectral trajectory parameters calculated for each of the linguistic units into a plurality of clusters on the basis of the linguistic context; and obtaining a trained spectral trajectory model that indicates for each cluster a statistical distribution of the spectral trajectory parameters belonging to that cluster.
-
-
10. A speech synthesis method comprising:
-
acquiring text information, which is a speech synthesis target, and performing a text analysis of the text information to generate linguistic context that indicates content of language in the text information; selecting, on the basis of the linguistic context of a linguistic unit in the text information, a spectral trajectory model of a cluster to which the linguistic unit belongs, from a storage unit that stores spectral trajectory models clustered into a plurality of clusters on the basis of the linguistic context of a plurality of the linguistic units, the spectral trajectory model indicating a statistical distribution of a plurality of spectral trajectory parameters of a plurality of speech signals on the text information, and each linguistic unit having a plurality of frames; and generating the spectral trajectory parameters of the linguistic unit on the basis of the selected spectral trajectory models and obtaining spectral coefficients by an inverse transformation of the spectral trajectory parameters.
-
Specification