Method and system for performing concatenative speech synthesis using half-phonemes
First Claim
1. A method of synthesizing speech using half-phonemes, comprising:
- receiving input text;
converting the input text into a sequence of half-phonemes;
comparing the half-phonemes in the sequence with a plurality of half-phonemes stored in a database;
selecting one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements;
processing the selected half-phonemes into synthesized speech; and
outputting the synthesized speech to an output device.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system are provided for performing concatenative speech synthesis using half-phonemes to allow the full utilization of both diphone synthesis and unit selection techniques in order to provide synthesis quality that can combine intelligibility achieved using diphone synthesis with a naturalness achieved using unit selection. The concatenative speech synthesis system may include a speech synthesizer that may comprise a linguistic processor, a unit selector and a speech processor. A speech training module may input trained speech off-line to the unit selector. The concatenative speech synthesis may normalize the input text in order to distinguish sentence boundaries from abbreviations. The normalized text is then grammatically analyzed to identify the syntactic structure of each constituent phrase. Orthographic characters used in normal text are mapped into appropriate strings of phonetic segments representing units of sound and speech. Prosody is then determined and timing and intonation patterns are then assigned to each of the half-phonemes. Once the text is converted into half-phonemes, the unit selector compares a requested half-phoneme sequence with units stored in the database in order to generate a candidate list for each half-phoneme. The candidate list is then input into a Viterbi searcher which determines the best match of all half-phonemes in the phoneme sequence. The selected string is then output to a speech processor for processing output audio to a speaker.
-
Citations
20 Claims
-
1. A method of synthesizing speech using half-phonemes, comprising:
-
receiving input text;
converting the input text into a sequence of half-phonemes;
comparing the half-phonemes in the sequence with a plurality of half-phonemes stored in a database;
selecting one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements;
processing the selected half-phonemes into synthesized speech; and
outputting the synthesized speech to an output device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
normalizing the input text to distinguish sentence boundaries from abbreviations;
grammatically analyzing the input text to syntactically identify parts-of-speech;
mapping the input text into phonetic segments of speech and sound; and
assigning timing and intonation patterns to each of the phonetic segments.
-
-
3. The method of claim 1, wherein the comparing step produces a pre-selected candidate list of half-phonemes from the database.
-
4. The method of claim 3, wherein the comparing step pre-selects candidate half-phonemes based on a predetermined threshold.
-
5. The method of claim 3, wherein the selecting step selects half-phonemes from the candidate half-phonemes using a Viterbi search mechanism.
-
6. The method of claim 1, wherein the selecting step selects half-phonemes based on the statistical measurements computed for individual half-phonemes and spectral measurements computed based on the relationship between half-phonemes in the sequence of half-phonemes.
-
7. The method of claim 1, further comprising:
-
computing statistical measurements between half-phonemes of training speech contained in a database; and
outputting the statistical measurements for performing the selecting step.
-
-
8. The method of claim 6, further comprising:
indexing the half-phonemes in the database based on timing measurements.
-
9. The method of claim 1, wherein the processing step synthesizes speech using one of Linear Predictive Coding, Time-Domain Pitch-Synchronous Overlap Add, or Harmonic Plus Noise methods.
-
10. A system for synthesizing speech using half-phonemes, comprising:
-
a linguistic processor that receives input text and converts the input text into a sequence of half-phonemes;
a unit selector, coupled to the linguistic processor, that compares the half-phonemes in the sequence with a plurality of half-phonemes stored in a database and selects one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements; and
a speech processor, coupled to the unit selector, that processes the selected half-phonemes into synthesized speech and outputs the synthesized speech to an output device. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
a text normalizer that receives and normalizes the input text to distinguish sentence boundaries from abbreviations;
a syntactic parser, coupled to the text normalizer, that grammatically analyzes the input text to syntactically identify parts-of-speech;
a word pronunciation module, coupled to the syntactic parser, that maps the input text into phonetic segments of speech and sound; and
a prosodic determination module, coupled to the word pronunciation module, that assigns timing and intonation patterns to each of the phonetic segments.
-
-
12. The system of claim 10, wherein the unit selector further comprises:
a preselector that selects a candidate list of half-phonemes from the database.
-
13. The system of claim 12, wherein the preselector selects candidate half-phonemes based on a predetermined threshold.
-
14. The system of claim 13, wherein the unit selector further comprises:
a Viterbi searcher, coupled to the preselector, that selects half-phonemes from the candidate half-phonemes using Viterbi search mechanisms.
-
15. The system of claim 14, wherein the Viterbi searcher selects half-phonemes based on the statistical measurements computed for individual half-phonemes and spectral measurements computed based on the relationship between half-phonemes in the sequence of half-phonemes.
-
16. The system of claim 10, further comprising:
a speech training module, coupled to the unit selector, that computes statistical measurements between half-phonemes of training speech contained in a database, and outputs the statistical measurements to the unit selector.
-
17. The system of claim 16, wherein the speech training module indexes the half-phonemes in the database based on timing measurements.
-
18. The system of claim 10, wherein the speech processor synthesizes speech using one of Linear Predictive Coding, Time-Domain Pitch-Synchronous Overlap Add, or Harmonic Plus Noise methods.
-
19. A system for synthesizing speech using half-phonemes, comprising:
-
linguistic processing means for receiving input text and converting the input text into a sequence of half-phonemes;
unit selecting means for comparing the half-phonemes in the sequence with a plurality of half-phonemes stored in a database and selecting one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements; and
speech processing means for processing the selected half-phonemes into synthesized speech and outputting the synthesized speech to an output device. - View Dependent Claims (20)
text normalizing means for normalizing the input text to distinguish sentence boundaries from abbreviations;
syntactic parsing means for grammatically analyzing the input text to syntactically identify parts-of-speech;
word pronunciation means for mapping the input text into phonetic segments of speech and sound;
prosodic determination means for assigning timing and intonation patterns to each of the phonetic segments;
preselection means for selecting a candidate list of half-phonemes from the database; and
Viterbi search means for selecting half-phonemes from the candidate list using Viterbi search mechanisms.
-
Specification