Method and system for preselection of suitable units for concatenative speech
First Claim
Patent Images
1. A method of synthesizing speech from an input text using phonemes, the method comprising the steps:
- a) creating a triphone preselection cost database including a plurality of all likely triphone combinations and generating a key to index each triphone in the database, wherein creating the triphone preselection cost database further comprises;
1) selecting a predetermined triphone sequence u1-u2-u3; and
2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in the database and the units ua and ub vary over the entire phoneme universe;
b) retrieving a portion of the input text for synthesis as a phoneme sequence;
c) comparing a retrieved phoneme, in context with its neighboring phonemes, with a plurality of N least cost triphone keys stored in the triphone preselection cost database;
d) choosing, as candidates for synthesis, a list of units from the triphone preselection cost database that comprise a matching triphone key;
e) repeating steps b) through d) for each phoneme in the input text;
f) selecting the least cost path through the network of candidates;
g) processing the phonemes selected in step f) into synthesized speech; and
h) outputting the synthesized speech to an output device.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. Prior to initiating the “real time” synthesis, a database is created of ail possible triphones (there are approximately 10000 in the English language) and their associated preselection costs. At run time, therefore, only the most likely candidates are selected from the triphone database, significantly reducing the calculations that are required to be performed in real time.
-
Citations
7 Claims
-
1. A method of synthesizing speech from an input text using phonemes, the method comprising the steps:
-
a) creating a triphone preselection cost database including a plurality of all likely triphone combinations and generating a key to index each triphone in the database, wherein creating the triphone preselection cost database further comprises;
1) selecting a predetermined triphone sequence u1-u2-u3; and
2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in the database and the units ua and ub vary over the entire phoneme universe;
b) retrieving a portion of the input text for synthesis as a phoneme sequence;
c) comparing a retrieved phoneme, in context with its neighboring phonemes, with a plurality of N least cost triphone keys stored in the triphone preselection cost database;
d) choosing, as candidates for synthesis, a list of units from the triphone preselection cost database that comprise a matching triphone key;
e) repeating steps b) through d) for each phoneme in the input text;
f) selecting the least cost path through the network of candidates;
g) processing the phonemes selected in step f) into synthesized speech; and
h) outputting the synthesized speech to an output device. - View Dependent Claims (2, 3, 4)
3) determining a plurality of N least cost database units for the particular 5-phoneme context;
4) performing the union of the N least cost units for all combinations of ua and ub;
5) storing the union created in step
4) in a triphone preselection cost database; and
6) repeating steps
1)-5) for each possible triphone sequence.
-
-
4. The method as defined in claim 3, wherein in performing step a4), N=50.
-
5. A method of creating a preselection cost database of triphones to be used in speech synthesis, the method comprising the steps of:
-
a) selecting a predetermined triphone sequence u1-u2-u3;
b) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in the database and the units ua and ub vary over the entire phoneme universe;
c) determining a plurality of N least cost database units for the particular 5-phoneme context;
d) performing the union of the plurality of N least cost database units determined in step c);
e) storing the union created in step d) in a triphone preselection cost database; and
f) repeating steps a)-e) for each possible triphone sequence. - View Dependent Claims (6, 7)
-
Specification