System and method for supporting text-to-speech
First Claim
1. A method of supporting text-to-speech synthesis, the method comprising:
- acquiring first frequency data set in a language processing unit, the first frequency data indicating appearance frequencies of readings corresponding to text wordings;
recognizing speech produced by a user reading a learning text;
generating first learning data by associating recognized readings from the speech with portions of the learning text, or by recognizing both wordings and readings of phrases from the speech;
generating, based on the first learning data, second frequency data indicating appearance frequencies of readings corresponding to wordings of phrases from the speech;
generating a plurality of frequency data candidates, each frequency data candidate indicating, for at least one combination of a plurality of continuously-written phrases, an appearance frequency of at least one combination of readings, the appearance frequency of the at least one combination of readings comprising a weighted average of an appearance frequency of the at least one combination of readings from the first frequency data with an appearance frequency of the at least one combination of readings from the second frequency data, wherein each of the plurality of frequency data candidates uses different weights for the weighted average;
for each one of the plurality of frequency data candidates using different weights for the weighted average, using the language processing unit to generate a set of readings corresponding to the learning text using the one of the plurality of frequency data candidates, wherein the set of readings comprises a subset of readings that match readings of the first learning data, and calculating a ratio of the subset of readings to the set of readings, wherein a first frequency data candidate of the plurality of frequency data candidates has a highest calculated ratio;
updating frequency data in the language processing unit using the first frequency data candidate with the highest calculated ratio; and
setting the updated frequency data in the language processing unit.
8 Assignments
0 Petitions
Accused Products
Abstract
A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.
-
Citations
1 Claim
-
1. A method of supporting text-to-speech synthesis, the method comprising:
-
acquiring first frequency data set in a language processing unit, the first frequency data indicating appearance frequencies of readings corresponding to text wordings; recognizing speech produced by a user reading a learning text; generating first learning data by associating recognized readings from the speech with portions of the learning text, or by recognizing both wordings and readings of phrases from the speech; generating, based on the first learning data, second frequency data indicating appearance frequencies of readings corresponding to wordings of phrases from the speech; generating a plurality of frequency data candidates, each frequency data candidate indicating, for at least one combination of a plurality of continuously-written phrases, an appearance frequency of at least one combination of readings, the appearance frequency of the at least one combination of readings comprising a weighted average of an appearance frequency of the at least one combination of readings from the first frequency data with an appearance frequency of the at least one combination of readings from the second frequency data, wherein each of the plurality of frequency data candidates uses different weights for the weighted average; for each one of the plurality of frequency data candidates using different weights for the weighted average, using the language processing unit to generate a set of readings corresponding to the learning text using the one of the plurality of frequency data candidates, wherein the set of readings comprises a subset of readings that match readings of the first learning data, and calculating a ratio of the subset of readings to the set of readings, wherein a first frequency data candidate of the plurality of frequency data candidates has a highest calculated ratio; updating frequency data in the language processing unit using the first frequency data candidate with the highest calculated ratio; and setting the updated frequency data in the language processing unit.
-
Specification