Multi-lingual text-to-speech system and method
First Claim
1. A multi-lingual text-to-speech system, comprising:
- an acoustic-prosodic model selection module, for an inputted text to be synthesized and containing a second-language (L2) portion, and an L2 phonetic unit transcription corresponding to the L2 portion of the inputted text, sequentially finds a second acoustic-prosodic model corresponding to each phonetic unit of the L2 phonetic unit transcription in an L2 acoustic-prosodic model set, searches an L2-to-L1 phonetic unit transformation table, L1 being a first language, and uses at least a controllable accent weighting parameter to determine a transformation combination to select a corresponding L1 phonetic unit transcription and sequentially find a first acoustic-prosodic model corresponding to each phonetic unit of said L1 phonetic unit transcription in an L1 acoustic-prosodic model set;
an acoustic-prosodic model mergence module that merges said first and said second acoustic-prosodic models into a merged acoustic-prosodic model according to said at least a controllable accent weighting parameter, sequentially processes all the transformations in said transformation combination, then sequentially arranges each merged acoustic-prosodic model to generate a merged acoustic-prosodic model sequence; and
a speech synthesizer, wherein said merged acoustic-prosodic model sequence is applied to said speech synthesizer to synthesize said inputted text into an L2 speech with an L1 accent based at least partly on the transformation combination determined by the controllable accent weighting parameter.
1 Assignment
0 Petitions
Accused Products
Abstract
A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.
16 Citations
14 Claims
-
1. A multi-lingual text-to-speech system, comprising:
-
an acoustic-prosodic model selection module, for an inputted text to be synthesized and containing a second-language (L2) portion, and an L2 phonetic unit transcription corresponding to the L2 portion of the inputted text, sequentially finds a second acoustic-prosodic model corresponding to each phonetic unit of the L2 phonetic unit transcription in an L2 acoustic-prosodic model set, searches an L2-to-L1 phonetic unit transformation table, L1 being a first language, and uses at least a controllable accent weighting parameter to determine a transformation combination to select a corresponding L1 phonetic unit transcription and sequentially find a first acoustic-prosodic model corresponding to each phonetic unit of said L1 phonetic unit transcription in an L1 acoustic-prosodic model set; an acoustic-prosodic model mergence module that merges said first and said second acoustic-prosodic models into a merged acoustic-prosodic model according to said at least a controllable accent weighting parameter, sequentially processes all the transformations in said transformation combination, then sequentially arranges each merged acoustic-prosodic model to generate a merged acoustic-prosodic model sequence; and a speech synthesizer, wherein said merged acoustic-prosodic model sequence is applied to said speech synthesizer to synthesize said inputted text into an L2 speech with an L1 accent based at least partly on the transformation combination determined by the controllable accent weighting parameter. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A multi-lingual text-to-speech system, executed on a computer system, said computer system having a memory device for storing at least a first and a second language acoustic-prosodic model sets, said multi-lingual text-to-speech system comprising:
a processor having an acoustic-prosodic model selection module, an acoustic-prosodic model mergence module and a speech synthesizer, wherein for an inputted text to be synthesized and containing a second-language (L2) portion, and an L2 phonetic unit transcription corresponding to the L2 portion of the inputted text, said acoustic-prosodic model selection module sequentially finds a second acoustic-prosodic model corresponding to each phonetic unit of the L2 phonetic unit transcription in an L2 acoustic-prosodic model set, searches an L2-to-L1 phonetic unit transformation, L1 being a first language, and uses at least a controllable accent weighting parameter to determine a transformation combination to select a corresponding L1 phonetic unit transcription and sequentially find a first acoustic-prosodic model corresponding to each phonetic unit of said L1 phonetic unit transcription in an L1 acoustic-prosodic model set, said acoustic-prosodic model mergence module merges said first and said second acoustic-prosodic models into a merged acoustic-prosodic model according to said at least a controllable accent weighting parameter, sequentially processes all the transformations in said transformation combination, then sequentially arranges each merged acoustic-prosodic model to generate a merged acoustic-prosodic model sequence, and said merged acoustic-prosodic model sequence is further applied to said speech synthesizer to synthesize said inputted text into an L2 speech with an L1 accent based at least partly on the transformation combination determined by the controllable accent weighting parameter.
-
7. A multi-lingual text-to-speech method, executed on a computer system, said computer system having a memory device for storing at least a first and a second language acoustic-prosodic model sets, said method comprising:
-
for an inputted text with second-language (L2) and L2 phonetic unit transcription corresponding to said inputted text to be synthesized, finding a second acoustic-prosodic model corresponding to each phonetic unit of said L2 phonetic unit transcription in an L2 acoustic-prosodic model set, searching an L2-to-L1 phonetic unit transformation table, L1 being a first language, and using at least a controllable accent weighting parameter to determine a transformation combination to select a corresponding L1 phonetic unit transcription and find a first acoustic-prosodic model corresponding to each phonetic unit of said L1 phonetic unit transcription in an L1 acoustic-prosodic model set; merging said first and said second acoustic-prosodic models into a merged acoustic-prosodic model according to said at least a controllable accent weighting parameter, processing all transformations in said transformation combination, and generating a merged acoustic-prosodic model sequence; and applying said merged acoustic-prosodic model set to a speech synthesizer to synthesize said inputted text into an LI-accent L2 speech based at least partly on the transformation combination determined by the controllable accent weighting parameter. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
Specification