METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH
First Claim
1. A method of speech synthesis, the method comprising:
- receiving input text;
selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text;
if candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes;
if no candidate phonemes are available in the triphone unit selection database, applying a single phoneme approach to select phonemes for synthesis; and
synthesizing speech using the set of phonemes from the candidate phonemes and/or the selected phonemes for synthesis from the single phoneme approach.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.
53 Citations
17 Claims
-
1. A method of speech synthesis, the method comprising:
-
receiving input text; selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text; if candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes; if no candidate phonemes are available in the triphone unit selection database, applying a single phoneme approach to select phonemes for synthesis; and synthesizing speech using the set of phonemes from the candidate phonemes and/or the selected phonemes for synthesis from the single phoneme approach. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computing device for synthesizing speech from text using a triphone unit selection database, the computing device comprising:
-
a processor; a module configured to control the processor to receive input text; a module configured to control the processor to select a plurality of N phoneme units from the triphone unit selection database as candidate phonemes for synthesized speech based on the input text; a module configured to control the processor to apply a cost process to select a set of phonemes from the candidate phonemes; and a module configured to control the processor to synthesize speech using a selected set of phonemes. - View Dependent Claims (8, 9, 10)
-
-
11. A tangible computer-readable medium storing instructions for controlling a computing device to synthesize speech from text using a triphone unit selection database, the instructions comprising:
-
receiving input text; selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text; applying a cost process to select a set of phonemes from the candidate phonemes; and synthesizing speech using the set of phonemes.
-
-
13. The tangible computer-readable medium of claim 12, wherein subsequent to the step of receiving the input text the following step is performed:
parsing the received text into recognizable units. - View Dependent Claims (14)
-
15. A computing device for synthesizing speech, the computing device comprising:
-
a processor; a module configured to control the processor to receive input text; a module configured to control the processor to select a plurality of N phonemes from a triphone unit selection database as candidate phonemes for a synthesized speech based on the input text; a module configured to control the processor, if candidate phonemes are available in the triphone unit selection database, to apply a cost process to select a set of phonemes from the candidate phonemes; a module configured to control the processor, if no candidate phonemes are available in the triphone unit selection database, to apply a single phoneme approach to select phonemes for synthesis; and a module configured to control the processor to synthesize speech using the set of phonemes from the candidate phonemes and/or the selected phonemes for synthesis from the single phoneme approach. - View Dependent Claims (16, 17)
-
Specification