Unit selection module and method of chinese text-to-speech synthesis
First Claim
1. A Chinese Text-To-Speech (TTS) synthesis system comprising:
- a computer system implementing a word pre-processing module configured to receive a text defining a Chinese sentence, a unit selection module, a speech generation module, an automatic speech unit-parsing module, and a speech output module; and
a corpus stored in database accessible by said computer system;
wherein said unit selection module comprises;
a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme;
said PCFG parser parses said Chinese sentence to obtain a context free grammar (CFG) of said Chinese sentence as its target unit;
said automatic speech unit-parsing module automatically labels the location of nodes of every syllable of the Chinese sentence;
said LSI module estimates the structural distance between the candidate synthesis units and the target unit in said corpus, and conducts a vectorization for estimating the structural distance, said vectorization transforming all the corpus words into ordered vectors and storing them in a CFG data matrix in the dimension of RxQ, wherein R stands for a number of grammar rules in a grammar G of the entire PCFG, and Q stands for the number of sentences in the corpus; and
through said modified variable-length unit selection scheme, tagged with a dynamic program algorithm, the units are searched to find the best synthesis unit concatenation sequence of said Chinese sentence;
wherein said speech output module is adapted to generate a synthesized speech output according to said concatenation sequence; and
wherein a Chomsky Normal Form is used to simplify and describe the PCFG parser and to simplify the estimation of the structural distance.
1 Assignment
0 Petitions
Accused Products
Abstract
A unit selection module for Chinese Text-to-Speech (TTS) synthesis includes a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme; any Chinese sentence is firstly input and then parsed into a context-free grammar (CFG) by the PCFG parser; wherein there are several possible CFGs for every Chinese sentence, and the CFG (or the syntactic structure) with the highest probability is then taken as the best CFG (or the syntactic structure) of the Chinese sentence; the LSI module is then used to calculate the structural distance between all the candidate synthesis units and the target unit in a corpus; through the modified variable-length unit selection scheme, tagged with the dynamic programming algorithm, the units are searched to find the best synthesis unit concatenation sequence.
-
Citations
16 Claims
-
1. A Chinese Text-To-Speech (TTS) synthesis system comprising:
-
a computer system implementing a word pre-processing module configured to receive a text defining a Chinese sentence, a unit selection module, a speech generation module, an automatic speech unit-parsing module, and a speech output module; and a corpus stored in database accessible by said computer system; wherein said unit selection module comprises;
a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme;said PCFG parser parses said Chinese sentence to obtain a context free grammar (CFG) of said Chinese sentence as its target unit; said automatic speech unit-parsing module automatically labels the location of nodes of every syllable of the Chinese sentence; said LSI module estimates the structural distance between the candidate synthesis units and the target unit in said corpus, and conducts a vectorization for estimating the structural distance, said vectorization transforming all the corpus words into ordered vectors and storing them in a CFG data matrix in the dimension of RxQ, wherein R stands for a number of grammar rules in a grammar G of the entire PCFG, and Q stands for the number of sentences in the corpus; and through said modified variable-length unit selection scheme, tagged with a dynamic program algorithm, the units are searched to find the best synthesis unit concatenation sequence of said Chinese sentence; wherein said speech output module is adapted to generate a synthesized speech output according to said concatenation sequence; and wherein a Chomsky Normal Form is used to simplify and describe the PCFG parser and to simplify the estimation of the structural distance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for Chinese Text-To-Speech (TTS) synthesis comprising:
-
inputting a text defining one or more Chinese sentences; performing a word pre-processing of said Chinese sentences; parsing a CFG of said Chinese sentences after they have been subject to said word pre-processing; building a target unit structural tree of said CFG; from a corpus, building a plurality of candidate unit structural trees; conducting a vectorization for estimating the structural distance, the vectorization transforming all the corpus words into ordered vectors and storing the them in a CEG data matrix in the dimension of RxQ, wherein R stands for the number of grammar rules in the Model G of the entire PCFG, and Q stands for the number of sentences in the corpus; estimating a structural distance between the target unit structural tree and said plurality of candidate synthesis unit structural trees, wherein a Chomsky Normal Form is used to simplify the estimation; searching the units so as to find the best synthesis unit concatenation sequence of said Chinese sentence; and outputting a synthesized speech according to said concatenation sequence. - View Dependent Claims (9)
-
-
10. A unit selection module used in the Chinese Text-To-Speech (TTS) synthesis system comprising:
-
a computer system implementing a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme, and an automatic speech unit-parsing module; wherein said PCFG parser parses a Chinese sentence to obtain the CFG of said Chinese sentence as its target unit; said automatic speech unit-parsing module automatically labels the location of nodes of every syllable of the Chinese sentence; said LSI module estimates the structural distance between the candidate synthesis units and the target unit in a corpus accessible by said computer system, and conducts a vectorization for estimating the structural distance, said vectorization transforming all the corpus words into ordered vectors and storing them in a CFG data matrix in the dimension of RxQ, wherein R stands for the number of grammar rules in a grammar G of the entire PCFG, and Q stands for the number of sentences in the corpus; and through said modified variable-length unit selection scheme, tagged with a dynamic program algorithm, the units are searched to find the best synthesis unit concatenation sequence of said Chinese sentence. - View Dependent Claims (11, 12, 13)
-
-
14. A unit selection method for the Chinese Text-To-Speech (TTS) synthesis system comprising:
-
inputting a context free grammar (CFG) of a Chinese sentence into a computer system; parsing the CFG of a Chinese sentence; building the target unit structural tree of said CEG of said Chinese sentence; from a corpus readable by said computer system, building a plurality of candidate unit structural trees; estimating the structural distance between said target unit structural tree and a plurality of said candidate synthesis unit structural trees, wherein a Chomsky Normal Form is used to simplify the estimation of the structural distance; searching the units to generate the best synthesis unit concatenation sequence of said Chinese sentence; and conducting a vectorization for estimating the structural distance, wherein said vectorization transforms all the corpus words into ordered vectors and stores them in a CFG data matrix in the dimension of RxQ, wherein R stands for the number of grammar rules in a grammar G of an entire PCFG, and Q stands for the number of sentences in the corpus. - View Dependent Claims (15, 16)
-
Specification