Speech synthesis apparatus and selection method
First Claim
Patent Images
1. Speech synthesis apparatus arranged to process an input to produce corresponding speech-form utterances, the apparatus comprising:
- a plurality of synthesis engines having different characteristics and each comprising a text-to-speech converter arranged to convert text-form utterances into speech form;
a synthesis-engine selector arranged to select one of the synthesis engines as the current operative engine, the selected synthesis engine being arranged to receive said input and to produce speech-form utterances for a speech application in response thereto; and
an assessment arrangement arranged to assess the overall quality of the speech-form utterances produced by the current operative synthesis engine C1 and to provide an action indicator to the synthesis-engine select, without changing said input, in response to the current speech form is inadequate;
the synthesis-engine selector being arranged to be responsive to action indictor provided thereto to select a different synthesis engine from said plurality to serve as the current operative engine.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesizer includes plural synthesis engines each having different characteristics and converting text-form utterances into speech form. One of the synthesis engines is selected as the current operative engine for producing speech-form utterances for a speech application. If the overall quality of the speech-form utterance produced by the text-to-speech converter of the current operative synthesis engine becomes inadequate, a different engine is selected as the current operative synthesis engine.
-
Citations
10 Claims
-
1. Speech synthesis apparatus arranged to process an input to produce corresponding speech-form utterances, the apparatus comprising:
-
a plurality of synthesis engines having different characteristics and each comprising a text-to-speech converter arranged to convert text-form utterances into speech form;
a synthesis-engine selector arranged to select one of the synthesis engines as the current operative engine, the selected synthesis engine being arranged to receive said input and to produce speech-form utterances for a speech application in response thereto; and
an assessment arrangement arranged to assess the overall quality of the speech-form utterances produced by the current operative synthesis engine C1 and to provide an action indicator to the synthesis-engine select, without changing said input, in response to the current speech form is inadequate;
the synthesis-engine selector being arranged to be responsive to action indictor provided thereto to select a different synthesis engine from said plurality to serve as the current operative engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
a respective classifier for each text-to-speech converter, each classifier being arranged to be responsive to the feature values generated by the corresponding text-to-speech converter when constituting at least part of the current operative synthesis engine, to provide a confidence measure of the speech form of the utterance concerned; and
a comparator for comparing confidence measures, produced by the classifier associated with the current operative synthesis engine, against one or more stored threshold values in order to determine whether to produce a said action indicator.
-
-
3. Apparatus according to claim 2, wherein the synthesis-engine selector is operative to cause the threshold values used by the comparator to be changed to match the currently selected synthesis engine.
-
4. Apparatus according to claim 1, wherein the text-to-speech converter of each synthesis engine is arranged to generate, in the course of converting a text-form utterance into speech form, values of predetermined features, which for that text-to-speech converter, are indicative of the overall quality of the speech form of the utterance, the assessment arrangement comprising:
-
a classifier arranged to be responsive to the feature values generated by the text-to-speech converter of the current operative synthesis engine, to provide a confidence measure of the speech form of the utterance concerned; and
a comparator for comparing confidence measures produced by the classifier against one or more stored threshold values in order to determine whether to produce a said action indicator.
-
-
5. Apparatus according to claim 4, wherein the synthesis-engine selector is operative to cause the threshold values used by the comparator to be changed to match the currently selected synthesis engine.
-
6. Apparatus according to claim 1, wherein the text-to-speech converter of each synthesis engine includes a concatenative speech generator which in generating a speech-form utterance, is arranged to produce an accumulated unit selection cost in respect of the speech units used to make up the speech-form utterance;
- the assessment arrangement comprising a comparator for comparing the selection cost produced by the speech generator of the current operative synthesis engine against one or more stored threshold values, in order to determine whether to produce a said action indicator.
-
7. Apparatus according to claim 6, wherein the synthesis-engine selector is operative to cause the threshold values used by the comparator to be changed to match the currently selected synthesis engine.
-
8. Apparatus according to claim 1, further comprising an output buffer for temporarily storing the latest speech-form utterance generated by the text-to-speech converter of the current operative synthesis engine, the assessment arrangement being arranged for releasing this speech-form utterance for output only in response to the assessment arrangement not producing an action indicator for causing the selection of different synthesis engine.
-
9. Apparatus according to claim 1, wherein the synthesis-engine selector is arranged for carrying out its selection of the synthesis engine next to constitute the current operative synthesis engine on the basis of the characteristics of the engines and of the current speech application.
-
10. A method of synthesizing speech with an apparatus arranged to process an input to produce corresponding speech-from utterances, the apparatus including plural speech synthesis engines for converting text type form into speech utterance form, different ones of the engines having different characteristics;
- the method comprising (a) selecting one of the engines as the operative engine that produces the speech-form utterances for a speech application, (b) assessing the overall quality of the speech-form utterances produced by the current operative synthesis engine based on confidence score to provide an action indicator, and (c) responding to the action indicator by selecting, without changing the input, another one of the engines as the operative engine in response to the selected engine producing a speech form utterance having inadequate quality, the another one of the engines being selected as a new current operative engine.
Specification