Unit selection module and method of chinese text-to-speech synthesis

US 7,574,360 B2
Filed: 07/22/2005
Issued: 08/11/2009
Est. Priority Date: 11/04/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A Chinese Text-To-Speech (TTS) synthesis system comprising:

a computer system implementing a word pre-processing module configured to receive a text defining a Chinese sentence, a unit selection module, a speech generation module, an automatic speech unit-parsing module, and a speech output module; and

a corpus stored in database accessible by said computer system;

wherein said unit selection module comprises;

a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme;

said PCFG parser parses said Chinese sentence to obtain a context free grammar (CFG) of said Chinese sentence as its target unit;

said automatic speech unit-parsing module automatically labels the location of nodes of every syllable of the Chinese sentence;

said LSI module estimates the structural distance between the candidate synthesis units and the target unit in said corpus, and conducts a vectorization for estimating the structural distance, said vectorization transforming all the corpus words into ordered vectors and storing them in a CFG data matrix in the dimension of RxQ, wherein R stands for a number of grammar rules in a grammar G of the entire PCFG, and Q stands for the number of sentences in the corpus; and

through said modified variable-length unit selection scheme, tagged with a dynamic program algorithm, the units are searched to find the best synthesis unit concatenation sequence of said Chinese sentence;

wherein said speech output module is adapted to generate a synthesized speech output according to said concatenation sequence; and

wherein a Chomsky Normal Form is used to simplify and describe the PCFG parser and to simplify the estimation of the structural distance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A unit selection module for Chinese Text-to-Speech (TTS) synthesis includes a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme; any Chinese sentence is firstly input and then parsed into a context-free grammar (CFG) by the PCFG parser; wherein there are several possible CFGs for every Chinese sentence, and the CFG (or the syntactic structure) with the highest probability is then taken as the best CFG (or the syntactic structure) of the Chinese sentence; the LSI module is then used to calculate the structural distance between all the candidate synthesis units and the target unit in a corpus; through the modified variable-length unit selection scheme, tagged with the dynamic programming algorithm, the units are searched to find the best synthesis unit concatenation sequence.

Citations

16 Claims

1. A Chinese Text-To-Speech (TTS) synthesis system comprising:
- a computer system implementing a word pre-processing module configured to receive a text defining a Chinese sentence, a unit selection module, a speech generation module, an automatic speech unit-parsing module, and a speech output module; and
  
  a corpus stored in database accessible by said computer system;
  
  wherein said unit selection module comprises;
  
  a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme;
  
  said PCFG parser parses said Chinese sentence to obtain a context free grammar (CFG) of said Chinese sentence as its target unit;
  
  said automatic speech unit-parsing module automatically labels the location of nodes of every syllable of the Chinese sentence;
  
  said LSI module estimates the structural distance between the candidate synthesis units and the target unit in said corpus, and conducts a vectorization for estimating the structural distance, said vectorization transforming all the corpus words into ordered vectors and storing them in a CFG data matrix in the dimension of RxQ, wherein R stands for a number of grammar rules in a grammar G of the entire PCFG, and Q stands for the number of sentences in the corpus; and
  
  through said modified variable-length unit selection scheme, tagged with a dynamic program algorithm, the units are searched to find the best synthesis unit concatenation sequence of said Chinese sentence;
  
  wherein said speech output module is adapted to generate a synthesized speech output according to said concatenation sequence; and
  
  wherein a Chomsky Normal Form is used to simplify and describe the PCFG parser and to simplify the estimation of the structural distance.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The Chinese Text-To-Speech (TTS) synthesis system as claimed in claim 1, wherein said word pre-processing module comprises:
    - word input processing and text format pre-processing.
  - 3. The Chinese Text-To-Speech (TTS) synthesis system as claimed in claim 1, wherein said corpus comprises Chinese sentences having a large number of vocabulary and their corresponding sound files.
  - 4. The Chinese Text-To-Speech (TTS) synthesis system as claimed in claim 1, wherein said corpus comprises Chinese sentences having a large number of vocabulary and the parallel corpus corresponding to the speech of said Chinese sentences.
  - 5. The Chinese Text-To-Speech (TTS) synthesis system as claimed in claim 1, wherein said PCFG parser builds the candidate synthesis unit structural trees and the target unit structural tree in said corpus.
  - 6. The Chinese Text-To-Speech (TTS) synthesis system as claimed in claim 5, wherein said LSI module conducts vector processing for the candidate synthesis unit structural trees and the target unit structural tree, to estimate the structural distance between them.
  - 7. The Chinese Text-To-Speech (TTS) synthesis system as claimed in claim 1, wherein said speech generation module generates the best synthesis unit concatenation sequence.

8. A method for Chinese Text-To-Speech (TTS) synthesis comprising:
- inputting a text defining one or more Chinese sentences;
  
  performing a word pre-processing of said Chinese sentences;
  
  parsing a CFG of said Chinese sentences after they have been subject to said word pre-processing;
  
  building a target unit structural tree of said CFG;
  
  from a corpus, building a plurality of candidate unit structural trees;
  
  conducting a vectorization for estimating the structural distance, the vectorization transforming all the corpus words into ordered vectors and storing the them in a CEG data matrix in the dimension of RxQ, wherein R stands for the number of grammar rules in the Model G of the entire PCFG, and Q stands for the number of sentences in the corpus;
  
  estimating a structural distance between the target unit structural tree and said plurality of candidate synthesis unit structural trees, wherein a Chomsky Normal Form is used to simplify the estimation;
  
  searching the units so as to find the best synthesis unit concatenation sequence of said Chinese sentence; and
  
  outputting a synthesized speech according to said concatenation sequence.
- View Dependent Claims (9)
- - 9. The method for Chinese Text-To-Speech (TTS) synthesis as claimed in claim 8, comprising:
    - an automatic speech unit-parsing module, which automatically labels the location of the nodes of every syllable of the Chinese sentence in said corpus by means of said speech-parsing module.

10. A unit selection module used in the Chinese Text-To-Speech (TTS) synthesis system comprising:
- a computer system implementing a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme, and an automatic speech unit-parsing module;
  
  wherein said PCFG parser parses a Chinese sentence to obtain the CFG of said Chinese sentence as its target unit;
  
  said automatic speech unit-parsing module automatically labels the location of nodes of every syllable of the Chinese sentence;
  
  said LSI module estimates the structural distance between the candidate synthesis units and the target unit in a corpus accessible by said computer system, and conducts a vectorization for estimating the structural distance, said vectorization transforming all the corpus words into ordered vectors and storing them in a CFG data matrix in the dimension of RxQ, wherein R stands for the number of grammar rules in a grammar G of the entire PCFG, and Q stands for the number of sentences in the corpus; and
  
  through said modified variable-length unit selection scheme, tagged with a dynamic program algorithm, the units are searched to find the best synthesis unit concatenation sequence of said Chinese sentence.
- View Dependent Claims (11, 12, 13)
- - 11. The unit selection module as claimed in claim 10, wherein said PCFG parser builds the candidate synthesis unit structural trees and the target unit structural tree in said corpus.
  - 12. The unit selection module as claimed in claim 11, wherein said LSI module conducts vector processing for the candidate synthesis unit structural trees and the target unit structural tree, to estimate the structural distance between them.
  - 13. The unit selection module as claimed in claim 10, wherein said PCFG parser calculates the plurality of possible CFG probabilities of said Chinese sentence, and then takes the CFG with the highest probability as the target unit.

14. A unit selection method for the Chinese Text-To-Speech (TTS) synthesis system comprising:
- inputting a context free grammar (CFG) of a Chinese sentence into a computer system;
  
  parsing the CFG of a Chinese sentence;
  
  building the target unit structural tree of said CEG of said Chinese sentence;
  
  from a corpus readable by said computer system, building a plurality of candidate unit structural trees;
  
  estimating the structural distance between said target unit structural tree and a plurality of said candidate synthesis unit structural trees, wherein a Chomsky Normal Form is used to simplify the estimation of the structural distance;
  
  searching the units to generate the best synthesis unit concatenation sequence of said Chinese sentence; and
  
  conducting a vectorization for estimating the structural distance, wherein said vectorization transforms all the corpus words into ordered vectors and stores them in a CFG data matrix in the dimension of RxQ, wherein R stands for the number of grammar rules in a grammar G of an entire PCFG, and Q stands for the number of sentences in the corpus.
- View Dependent Claims (15, 16)
- - 15. The unit selection method as claimed in claim 14, comprising:
    - the plurality of possible CFG probabilities of said Chinese sentence are calculated, and then the CFG with the highest probability is taken as the target unit.
  - 16. The unit selection method as claimed in claim 14, comprising:
    - vector processing for the candidate synthesis unit structural trees and the target unit structural tree, to estimate the structural distance between them.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Cheng KUNG University (Government of The Republic of China)
Original Assignee
National Cheng KUNG University (Government of The Republic of China)
Inventors
Wu, Chung Hsien, Wang, Jhing Fa, Hsia, Chi Chun, Chen, Jiun Fu
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
Godbold; Douglas C

Application Number

US11/186,876
Publication Number

US 20060095264A1
Time in Patent Office

1,481 Days
Field of Search

704/251, 704/257, 704/9, 704/258, 704/231, 704/255, 704/266, 704/260
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

Unit selection module and method of chinese text-to-speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Unit selection module and method of chinese text-to-speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links