Method and system for preselection of suitable units for concatenative speech
First Claim
1. A method comprising:
- receiving input text;
when candidate phonemes for synthesizing speech based on the input text are available from a top N triphone units, applying, using a processor, a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination;
when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and
synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes. If no candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, which single phonemes are used in synthesis independent of a triphone structure. The method also includes synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the selected single phonemes for synthesis from the single phoneme approach.
45 Citations
15 Claims
-
1. A method comprising:
-
receiving input text; when candidate phonemes for synthesizing speech based on the input text are available from a top N triphone units, applying, using a processor, a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination; when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; a non-transitory computer-readable storage medium storing instructions which, when executed on the processor, perform a method comprising; receiving input text; when candidate phonemes for synthesizing speech based on the input text are available from a top N triphone units, applying a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination; when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure. - View Dependent Claims (8, 9, 10)
-
-
11. A non-transitory computer-readable medium storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
-
receiving input text; when candidate phonemes are available in the top N triphone units applying a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination; when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure. - View Dependent Claims (12, 13, 14, 15)
-
Specification