Method for detecting the time sequences of a fundamental frequency of an audio response unit to be synthesized

US 7,219,061 B1
Filed: 10/24/2000
Issued: 05/15/2007
Est. Priority Date: 10/28/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for determining the time characteristic of a fundamental frequency of speech to be synthesized, comprising:

determining macrosegments of the fundamental frequency by a neural network, each macrosegment comprising a time sequence of the fundamental frequency of a phonetic linguistic unit of the speech, andselecting microsegments to reproduce each macrosegment by selecting fundamental-frequency sequences from a plurality of fundamental-frequency sequences stored in a database, each microsegment comprising a time sequence of the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database in such a manner that each macrosegment is reproduced with the least possible deviation between successive microsegments.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Predetermined macrosegments of the fundamental frequency are determined by a neural network, and these predefined macrosegments are reproduced by fundamental-frequency sequences stored in a database. The fundamental frequency is generated on the basis of a relatively large text section which is analyzed by the neural network. Microstructures from the database are received in the fundamental frequency. The fundamental frequency thus formed is thus optimized both with regard to its macrostructure and to its microstructure. As a result, an extremely natural sound is achieved.

14 Citations

View as Search Results

24 Claims

1. A method for determining the time characteristic of a fundamental frequency of speech to be synthesized, comprising:
- determining macrosegments of the fundamental frequency by a neural network, each macrosegment comprising a time sequence of the fundamental frequency of a phonetic linguistic unit of the speech, andselecting microsegments to reproduce each macrosegment by selecting fundamental-frequency sequences from a plurality of fundamental-frequency sequences stored in a database, each microsegment comprising a time sequence of the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database in such a manner that each macrosegment is reproduced with the least possible deviation between successive microsegments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The method as claimed in claim 1, wherein the phonetic linguistic unit is selected from the group consisting of a phrase, a word, and a syllable.
  - 3. The method as claimed in claim 2, wherein the fundamental-frequency sequences of the microsegments represent the fundamental frequencies of in each case one phoneme.
  - 4. The method as claimed in claim 3, wherein the fundamental-frequency sequences of the microsegments which are located within a time range of one of the macrosegments are assembled to form one reproduced macrosegment, the deviation of the reproduced macrosegment from the respective macrosegment being determined and the fundamental-frequency sequences being optimized in such a manner that the deviation is as small as possible.
  - 5. The method as claimed in claim 4, wherein in each case a number of fundamental-frequency sequences can be selected for the individual microsegments, where the combinations of fundamental-frequency sequences resulting in the least deviation between the respective reproduced macrosegment and the respective macrosegment are selected.
  - 6. The method as claimed in claim 5, wherein the deviation between the reproduced macrosegment and the macrosegment is determined by a cost function which is weighted in such a manner that in the case of small deviations from the fundamental frequency of the macrosegment, only a small deviation is determined and when a predetermined limit frequency difference is exceeded, the deviations determined rise steeply until a saturation value is reached.
  - 7. The method as claimed in claim 6, wherein the deviation between the reproduced macrosegment and the macrosegment is determined by a cost function by which a multiplicity of deviations distributed over the macrosegments are weighted, and the closer the deviations are to the edge of a syllable, the less weighting is applied to them.
  - 8. The method as claimed claim 7, wherein during the selecting of the fundamental-frequency sequences, the individual fundamental-frequency sequences are syntonized with the following or preceding fundamental-frequency sequences in accordance with predetermined criteria and only combinations of fundamental-frequency sequences meeting the criteria of being admitted to be assembled to form a reproduced macrosegment.
  - 9. The method as claimed in claim 8, wherein adjacent fundamental-frequency sequences are assessed by means of a cost function which generates an output value, to be minimized, for a junction between fundamental-frequency sequences, and the greater the difference at the end of the preceding fundamental-frequency sequence from the frequency at the beginning of the subsequent fundamental-frequency sequence, the greater the output value.
  - 10. The method as claimed in claim 9, wherein the closer the a junction is to an edge of a syllable, the less weighting is applied to the output value.
  - 11. The method as claimed in claim 10, wherein the macrosegments are concatenated with one another and the fundamental frequencies are matched to one another at the junctions of the macrosegments.
  - 12. The method as claimed in claim 11, wherein the neural network determines the macrosegments for a predetermined section of a text on the basis of this text section and of a text section preceding and/or following this text section.
  - 13. The method as claimed in claim 1, wherein the fundamental-frequency sequences of the microsegments represent the fundamental frequencies of in each case one phoneme.
  - 14. The method as claimed in claim 1, wherein the fundamental-frequency sequences of the microsegments which are located within a time range of one of the macrosegments are assembled to form one reproduced macrosegment, the deviation of the reproduced macrosegment from the respective macrosegment being determined and the fundamental-frequency sequences being optimized in such a manner that the deviation is as small as possible.
  - 15. The method as claimed in claim 14, wherein in each case a number of fundamental-frequency sequences can be selected for the individual microsegments, where the combinations of fundamental-frequency sequences resulting in the least deviation between the respective reproduced macrosegment and the respective macrosegment are selected.
  - 16. The method as claimed in claim 15, wherein the deviation between the reproduced macrosegment and the macrosegment is determined by a cost function which is weighted in such a manner that in the case of small deviations from the fundamental frequency of the macrosegment, only a small deviation is determined and when a predetermined limit frequency difference is exceeded, the deviations determined rise steeply until a saturation value is reached.
  - 17. The method as claimed in claim 15, wherein the deviation between the reproduced macrosegment and the macrosegment is determined by a cost function by which a multiplicity of deviations distributed over the macrosegments are weighted, and the closer the deviations are to the edge of a syllable, the less weighting is applied to them.
  - 18. The method as claimed claim 15, wherein during the selecting of the fundamental-frequency sequences, the individual fundamental-frequency sequences are synchronized with the following or preceding fundamental-frequency sequences in accordance with predetermined criteria and only combinations of fundamental-frequency sequences meeting the criteria of being admitted to be assembled to form a reproduced macrosegment.
  - 19. The method as claimed in claim 18, wherein adjacent fundamental-frequency sequences are assessed by means of a cost function which generates an output value, to be minimized, for a junction between fundamental-frequency sequences, and the greater the difference at the end of the preceding fundamental-frequency sequence from the frequency at the beginning of the subsequent fundamental-frequency sequence, the greater the output value.
  - 20. The method as claimed in claim 19, wherein the closer the a junction is to an edge of a syllable, the less weighting is applied to the output value.
  - 21. The method as claimed in claim 1, wherein the macrosegments are concatenated with one another and the fundamental frequencies are matched to one another at the junctions of the macrosegments.
  - 22. The method as claimed in claim 1, wherein the neural network determines the macrosegments for a predetermined section of a text on the basis of this text section and of a text section preceding and/or following this text section.

23. A method for synthesizing speech in which a text is converted to a sequence of acoustic signals, comprisingconverting the text into a sequence of phonemes,generating a stressing structure,determining the duration of the individual phonemes,determining the time characteristic of a fundamental frequency by a method comprising:
- determining macrosegments of the fundamental frequency by a neural network, each macrosegment comprising a time sequence of the fundamental frequency of a phonetic linguistic unit of the speech, andselecting microsegments to reproduce each macrosegment by selecting fundamental-frequency sequences from a plurality of fundamental-frequency sequences stored in a database, each microsegment comprising a time sequence of the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database in such a manner that each macrosegment is reproduced with the least possible deviation between successive microsegments, andgenerating the acoustic signals representing the speech on the basis of the sequence of phonemes determined and of the fundamental frequency determined.

24. A method for reproducing a speech synthesis macrosegment, comprising:
- using a neural network, selecting microsegments by selecting a fundamental-frequency sequences from a plurality of fundamental frequency sequences stored in a database, each microsegment comprising a time sequence at the fundamental frequency of a subunit of the phonetic linguistic unit of the speech, the fundamental-frequency sequences being selected from the database to minimize deviations between successive microsegments; and
  
  assembling the microsegments with the selected fundamental-frequency sequences and thereby reproducing the macrosegment each macrosegment comprising a time sequence at the fundamental frequency of a phonetic linguistic unit of the speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Siemens AG
Original Assignee
Siemens AG
Inventors
Holzapfel, Martin, Erdem, Caglayan
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
Jackson; Jakieda R.

Application Number

US10/111,695
Time in Patent Office

2,394 Days
Field of Search

704/268, 704/211, 704/207, 704/209, 704/202, 704/232, 704/259, 704/267
US Class Current

704/268
CPC Class Codes

G10L 2025/783   based on threshold decision

G10L 25/30   using neural networks

G10L 25/90   Pitch determination of spee...

Method for detecting the time sequences of a fundamental frequency of an audio response unit to be synthesized

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

14 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method for detecting the time sequences of a fundamental frequency of an audio response unit to be synthesized

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links