Text to speech synthesis
First Claim
1. A method for converting an input linguistic description into a speech waveform comprising:
- deriving at least one target unit sequence corresponding to the input linguistic description;
assigning in a waveform unit database one or more waveform units to each target unit of the at least one target unit sequence;
selecting for the at least one target unit sequence a plurality of alternative waveform unit sequences approximating the at least one target unit sequence, using the one or more waveform units assigned to each target unit of the at least one target unit sequence;
concatenating the alternative waveform unit sequences to form alternative speech waveforms; and
presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms.
8 Assignments
0 Petitions
Accused Products
Abstract
An input linguistic description is converted into a speech waveform by deriving at least one target unit sequence corresponding to the linguistic description, selecting from a waveform unit database for the target unit sequences a plurality of alternative unit sequences approximating the target unit sequences, concatenating the alternative unit sequences to alternative speech waveforms and presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. There are no iterative cycles of manual modification and automatic selection, which enables a fast way of working. The operator does not need knowledge of units, targets, and costs, but chooses from a set of given alternatives. The fine-tuning of TTS prompts therefore becomes accessible to non-experts.
-
Citations
18 Claims
-
1. A method for converting an input linguistic description into a speech waveform comprising:
-
deriving at least one target unit sequence corresponding to the input linguistic description; assigning in a waveform unit database one or more waveform units to each target unit of the at least one target unit sequence; selecting for the at least one target unit sequence a plurality of alternative waveform unit sequences approximating the at least one target unit sequence, using the one or more waveform units assigned to each target unit of the at least one target unit sequence; concatenating the alternative waveform unit sequences to form alternative speech waveforms; and presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A text to speech processor for converting an input linguistic description into a speech waveform, said processor comprising:
-
a deriving unit for deriving at least one target unit sequence corresponding to the input linguistic description; an assigning unit for assigning in a waveform unit database one or more waveform units to each target unit of the at least one target unit sequence; a selection unit for selecting the at least one target unit sequence a plurality of alternative unit sequences approximating the at least one target unit sequence, using the one or more waveform units assigned to each target unit of the at least one target unit sequence; a concatenating unit for concatenating the alternative waveform unit sequences to form alternative speech waveforms; and a presenting unit for presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. - View Dependent Claims (18)
-
Specification