Speech duration processing method and apparatus for Chinese text-to-speech system
First Claim
1. A speech duration processing method for a Chinese text-to-speech system using Chinese phonemes as a basic processing unit, the method comprising:
- constructing a dictionary that stores Chinese vocabulary and corresponding information including phonetic markers, parts of speech, and expansion syntax;
constructing a syllable-phoneme look-up portion that stores information including at least one of consonant designated numbers and vowel designated numbers corresponding to each Chinese syllable;
constructing a basic speech duration storage portion that stores basic speech duration information classified according to phonemes;
constructing a speech duration parameter storage portion that stores speech duration parameters associated with tones of the syllables to which each of the phonemes belong, phrase construction, locations in the phrases, locations in the sentence, and class of the adjacent phonemes;
inspecting positions of the syllables of each vocabulary in an input sentence of a variable length by comparison with the vocabulary stored in the dictionary;
generating a phonetic representation of each syllable of each inspected vocabulary according to the phonetic markers stored in the dictionary;
inspecting the part of speech and the expansion syntax of each inspected vocabulary with reference to the dictionary;
combining the vocabulary in the input sentence into phrases according to the expansion syntax and relationship of the parts of speech of adjacent ones of the vocabulary;
inspecting each syllable in the generated phonetic representation by reference to tone markers;
inspecting the phoneme formation of each inspected syllable with reference to the information in the syllable-phoneme look-up portion;
retrieving the basic speech duration information of each inspected phoneme from the basic speech duration storage portion; and
calculating the speech duration of each of the inspected phonemes that form each of the inspected syllables from the basic speech duration information and the parameters associated with the tones, the phrase constructions, the locations in the phrases, the locations in the sentence, and the class of the adjacent phonemes of the inspected phonemes, and combining the speech duration of the inspected phonemes to obtain the speech duration of each of the inspected syllables.
5 Assignments
0 Petitions
Accused Products
Abstract
The duration of speech varies according to the characteristics of pronounced speech and pronouncing habit of the speaker. In the speech duration processing method and apparatus of this invention, a large amount of natural speech was analyzed, and the following was known: Speech duration of monosyllables will vary according to factors, such as phonemes, tones, phrase construction, locations in the phrases, locations in the sentence, and front and rear connected phonemes, etc. of the syllables. Through the use of these varying factors, a “speech duration parameter storage portion” for speech duration parameters is constructed. By retrieving the speech duration parameters and combining the same with the basic speech duration of a syllable during syllable speech duration calculation, the speech duration of each monosyllable in any sentence can be accurately decided. As recognized from experimental results, a text-to-speech system using the speech duration processing apparatus of this invention can synthesize speech with natural speech duration.
12 Citations
4 Claims
-
1. A speech duration processing method for a Chinese text-to-speech system using Chinese phonemes as a basic processing unit, the method comprising:
-
constructing a dictionary that stores Chinese vocabulary and corresponding information including phonetic markers, parts of speech, and expansion syntax;
constructing a syllable-phoneme look-up portion that stores information including at least one of consonant designated numbers and vowel designated numbers corresponding to each Chinese syllable;
constructing a basic speech duration storage portion that stores basic speech duration information classified according to phonemes;
constructing a speech duration parameter storage portion that stores speech duration parameters associated with tones of the syllables to which each of the phonemes belong, phrase construction, locations in the phrases, locations in the sentence, and class of the adjacent phonemes;
inspecting positions of the syllables of each vocabulary in an input sentence of a variable length by comparison with the vocabulary stored in the dictionary;
generating a phonetic representation of each syllable of each inspected vocabulary according to the phonetic markers stored in the dictionary;
inspecting the part of speech and the expansion syntax of each inspected vocabulary with reference to the dictionary;
combining the vocabulary in the input sentence into phrases according to the expansion syntax and relationship of the parts of speech of adjacent ones of the vocabulary;
inspecting each syllable in the generated phonetic representation by reference to tone markers;
inspecting the phoneme formation of each inspected syllable with reference to the information in the syllable-phoneme look-up portion;
retrieving the basic speech duration information of each inspected phoneme from the basic speech duration storage portion; and
calculating the speech duration of each of the inspected phonemes that form each of the inspected syllables from the basic speech duration information and the parameters associated with the tones, the phrase constructions, the locations in the phrases, the locations in the sentence, and the class of the adjacent phonemes of the inspected phonemes, and combining the speech duration of the inspected phonemes to obtain the speech duration of each of the inspected syllables.
-
-
2. A speech duration processing method for a Chinese text-to-speech system using Chinese syllables as a basic processing unit, the method comprising:
-
constructing a dictionary that stores Chinese vocabulary and corresponding information including phonetic markers, parts of speech, and expansion syntax;
constructing a basic speech duration storage portion that stores basic speech duration information classified according to syllables;
constructing a speech duration parameter storage portion that stores speech duration parameters associated with tones of each of the syllables, phrase constructions, locations in the phrases, locations in the sentence, and class of the adjacent syllables;
inspecting positions of the syllables of each vocabulary in an input sentence of variable length by comparison with the vocabulary stored in the dictionary;
generating a phonetic representation of each syllable of each inspected vocabulary according to the phonetic markers stored in the dictionary;
inspecting the part of speech and the expansion syntax of each inspected vocabulary with reference to the dictionary;
combining the vocabulary in the input sentence into phrases according to the expansion syntax and relationship of the parts of speech of adjacent ones of the vocabulary;
inspecting each syllable in the generated phonetic representation by reference to tone markers;
retrieving the basic speech duration information of each inspected syllable from the basic speech duration storage portion; and
calculating the speech duration of each of the inspected syllables from the basic speech duration information and the parameters associated with the tones, the phrase construction, the locations in the phrases, the locations in the sentence, and the class of the adjacent syllables of the inspected syllables.
-
-
3. A speech duration processing apparatus for a Chinese text-to-speech system using Chinese phonemes as a basic processing unit, the apparatus comprising:
-
a dictionary that stores Chinese vocabulary and corresponding information including phonetic markers, parts of speech, and expansion syntax;
a syllable-phoneme look-up portion that stores information including at least one of consonant designated numbers and vowel designated numbers corresponding to each Chinese syllable;
a basic speech duration storage portion that stores basic speech duration information classified according to the phonemes;
a speech duration parameter storage portion that stores speech duration parameters associated with tones of the syllables to which each of the phonemes belong, phrase construction, locations in the phrases, locations in the sentence, and class of the adjacent phonemes;
a vocabulary inspector that inspects positions of the syllables of each vocabulary in an input sentence of variable length by comparison with the vocabulary stored in the dictionary;
a phonetic marker generator that generates a phonetic representation of each syllable of each inspected vocabulary according to the phonetic markers stored in the dictionary;
a part of speech/expansion syntax inspector that inspects the part of speech and the expansion syntax of each inspected vocabulary with reference to the dictionary;
a phrase expander that combines the vocabulary in the input sentence into phrases according to the expansion syntax and relationship of the parts of speech of adjacent ones of the vocabulary;
a tone/syllable inspector that inspects each syllable in the generated phonetic representation by reference to tone markers;
a phoneme inspector that inspects the phoneme formation of each of the inspected syllables with reference to the information in the syllable-phoneme look-up portion;
a basic speech duration decider that retrieves the basic speech duration information of each of the inspected phonemes from the basic speech duration storage portion; and
a syllable speech duration calculator that calculates the speech duration of each of the inspected phonemes that form each of the inspected syllables from the basic speech duration information and the parameters associated with the tones, the phrase constructions, the locations in the phrases, the locations in the sentence, and the class of the adjacent phonemes of the inspected phonemes, and that combines the speech duration of the inspected phonemes to obtain the speech duration of each of the inspected syllables.
-
-
4. A speech duration processing apparatus for a Chinese text-to-speech system using Chinese syllables as a basic processing unit, the apparatus comprising:
-
a dictionary that stores Chinese vocabulary and corresponding information including phonetic markers, parts of speech, and expansion syntax;
a basic speech duration storage portion that stores basic speech duration information classified according to syllables;
a speech duration parameter storage portion that stores speech duration parameters associated with tones of each of the syllables, phrase construction, locations in the phrases, locations in the sentence, and class of the adjacent syllables;
a vocabulary inspector that inspects positions of the syllables of each vocabulary in an input sentence of variable length by comparison with the vocabulary stored in the dictionary;
a phonetic marker generator that generates a phonetic representation of each syllable of each inspected vocabulary according to the phonetic markers stored in the dictionary;
a part of speech/expansion syntax inspector that inspects the part of speech and the expansion syntax of each inspected vocabulary with reference to the dictionary;
a phrase expander that combines the vocabulary in the input sentence into phrases according to the expansion syntax and relationship of the parts of speech of adjacent ones of the vocabulary;
a tone/syllable inspector that inspects each syllable in the generated phonetic representation by reference to tone markers;
a basic speech duration decider that retrieves the basic speech duration information of each inspected syllable from the basic speech duration storage portion; and
a syllable speech duration calculator that calculates the speech duration of each of the inspected syllables from the basic speech duration information and the parameters associated with the tones, the phrase constructions, the locations in the phrases, the locations in the sentence, and the class of the adjacent syllables of the inspected syllables.
-
Specification