Method for detecting the time sequences of a fundamental frequency of an audio response unit to be synthesized
First Claim
Patent Images
1. A method for determining the time characteristic of a fundamental frequency of speech to be synthesized, comprising:
- determining macrosegments of the fundamental frequency by a neural network, each macrosegment comprising a time sequence of the fundamental frequency of a phonetic linguistic unit of the speech, andselecting microsegments to reproduce each macrosegment by selecting fundamental-frequency sequences from a plurality of fundamental-frequency sequences stored in a database, each microsegment comprising a time sequence of the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database in such a manner that each macrosegment is reproduced with the least possible deviation between successive microsegments.
1 Assignment
0 Petitions
Accused Products
Abstract
Predetermined macrosegments of the fundamental frequency are determined by a neural network, and these predefined macrosegments are reproduced by fundamental-frequency sequences stored in a database. The fundamental frequency is generated on the basis of a relatively large text section which is analyzed by the neural network. Microstructures from the database are received in the fundamental frequency. The fundamental frequency thus formed is thus optimized both with regard to its macrostructure and to its microstructure. As a result, an extremely natural sound is achieved.
14 Citations
24 Claims
-
1. A method for determining the time characteristic of a fundamental frequency of speech to be synthesized, comprising:
-
determining macrosegments of the fundamental frequency by a neural network, each macrosegment comprising a time sequence of the fundamental frequency of a phonetic linguistic unit of the speech, and selecting microsegments to reproduce each macrosegment by selecting fundamental-frequency sequences from a plurality of fundamental-frequency sequences stored in a database, each microsegment comprising a time sequence of the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database in such a manner that each macrosegment is reproduced with the least possible deviation between successive microsegments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for synthesizing speech in which a text is converted to a sequence of acoustic signals, comprising
converting the text into a sequence of phonemes, generating a stressing structure, determining the duration of the individual phonemes, determining the time characteristic of a fundamental frequency by a method comprising: -
determining macrosegments of the fundamental frequency by a neural network, each macrosegment comprising a time sequence of the fundamental frequency of a phonetic linguistic unit of the speech, and selecting microsegments to reproduce each macrosegment by selecting fundamental-frequency sequences from a plurality of fundamental-frequency sequences stored in a database, each microsegment comprising a time sequence of the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database in such a manner that each macrosegment is reproduced with the least possible deviation between successive microsegments, and generating the acoustic signals representing the speech on the basis of the sequence of phonemes determined and of the fundamental frequency determined.
-
-
24. A method for reproducing a speech synthesis macrosegment, comprising:
-
using a neural network, selecting microsegments by selecting a fundamental-frequency sequences from a plurality of fundamental frequency sequences stored in a database, each microsegment comprising a time sequence at the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database to minimize deviations between successive microsegments; and assembling the microsegments with the selected fundamental-frequency sequences and thereby reproducing the macrosegment each macrosegment comprising a time sequence at the fundamental frequency of a phonetic linguistic unit of the speech.
-
Specification