System for generating formant tracks by modifying formants synthesized from speech units
First Claim
1. A method of tracking formants corresponding to a speech signal, the method comprising:
- obtaining a speech frequency response based on the speech signal;
providing speech units corresponding to the speech signal;
obtaining formants from a formant synthesizer, wherein the formants correspond to the speech units; and
modifying the formants based on specific proportional characteristics of the speech frequency response to obtain modified formants for formant tracks.
1 Assignment
0 Petitions
Accused Products
Abstract
Formants, corresponding to input speech units based either on a known text or the results of a speech recognition procedure, are generated from a formant synthesizer. A frequency response is generated based on the synthesized formants. A second frequency response is generated based on a speech signal which is received and which corresponds to utterances of speech units. The synthesized formants are modified based on a comparison of the frequency response corresponding to the synthesized formants and specific proportional characteristics of a frequency response of the input speech signal. In one illustrative embodiment, the comparison is then recalculated and further modifications are made accordingly to improve accuracy. In one illustrative embodiment, time aligning and frequency warping are utilized as modification functions.
11 Citations
32 Claims
-
1. A method of tracking formants corresponding to a speech signal, the method comprising:
-
obtaining a speech frequency response based on the speech signal;
providing speech units corresponding to the speech signal;
obtaining formants from a formant synthesizer, wherein the formants correspond to the speech units; and
modifying the formants based on specific proportional characteristics of the speech frequency response to obtain modified formants for formant tracks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
obtaining a formant frequency response associated with the formants obtained from the formant synthesizer.
-
-
3. The method of claim 2 wherein modifying comprises:
-
comparing the speech frequency response with the formant frequency response; and
modifying the formants based on the comparison.
-
-
4. The method of claim 3 wherein comparing comprises:
-
comparing characteristics of the speech frequency response and the formant frequency response at a plurality of time instants; and
modifying the formant frequency response at a plurality of time instants based on the comparison.
-
-
5. The method of claim 4 wherein modifying the formant frequency response comprises:
time aligning the formant frequency response at the plurality of time instants with the speech frequency response at the plurality of time instants.
-
6. The method of claim 4 wherein comparing comprises:
-
comparing frequencies in the speech frequency response and the formant frequency response; and
modifying the formant frequency response based on the speech frequency response.
-
-
7. The method of claim 3 wherein providing speech units comprises:
performing speech recognition on the speech signal to obtain the speech units.
-
8. The method of claim 7 wherein performing speech recognition comprises:
providing a plurality of possible speech units corresponding to each of a plurality of intervals of the speech signal, and further comprising choosing one of the plurality of possible speech units based on the comparing step.
-
9. The method of claim 1 wherein the speech signal is generated based on a known text and wherein providing speech units comprises:
retrieving the speech units from a speech unit store based on the known text.
-
10. The method of claim 1 wherein obtaining formants from a formant synthesizer comprises:
having a formant synthesizer provide a set of frequencies and bandwidths indicative of the formants.
-
11. The method of claim 10 wherein modifying comprises:
modifying the frequencies and bandwidths indicative of the formants based on the speech frequency response.
-
12. The method of claim 1 and further comprising:
modifying the formant synthesizer based on the modified formants.
-
13. A formant tracker, comprising:
-
a first frequency response generator configured to receive a speech signal and provide a speech frequency response based on the speech signal;
a formant synthesizer configured to receive speech units associated with the speech signal and to provide formants corresponding to the speech units;
a second frequency generator coupled to the formant synthesizer and configured to generate a formant frequency response based on the formants; and
a modification component coupled to the first and second frequency response generators and configured to modify the formants based on differences between specific proportional characteristics of the speech frequency response and the formant frequency response to provide modified formants. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
a comparison component configured to compare the speech frequency response with the formant frequency response; and
a modifier configured to modify the formants based on the comparison.
-
-
15. The formant tracker of claim 14 wherein the comparison component comprises:
-
a timing comparison component configured to compare timing characteristics of the speech frequency response and the formant frequency response; and
wherein the modifier includes a timing modifier configured to modify the formant frequency response based on the comparison.
-
-
16. The formant tracker of claim 15 wherein the timing modifier is configured to time align the formant frequency response with the speech frequency response.
-
17. The formant tracker of claim 15 wherein the comparison component comprises:
-
a frequency comparison component configured to compare frequencies in the speech frequency response and the formant frequency response; and
wherein the modifier includes a frequency modifier configured to modify the formant frequency response based on the speech frequency response.
-
-
18. The formant tracker of claim 14 and further comprising:
a speech recognition engine configured to perform speech recognition on the speech signal to obtain the speech units.
-
19. The formant tracker of claim 18 wherein the speech recognition engine is configured to provide a plurality of possible speech units corresponding to each of a plurality of intervals of the speech signal, and wherein the comparison component is configured to choose one of the plurality of possible speech units based on the comparison of the speech frequency response and the formant frequency response.
-
20. The formant tracker of claim 13 wherein the speech signal is generated based on a known text and further comprising:
a speech unit store, coupled to the formant synthesizer, storing the speech units corresponding to the known text.
-
21. The formant tracker of claim 14 wherein the formant synthesizer is configured to provide a set of frequencies and bandwidths indicative of the formants of the speech units.
-
22. The formant tracker of claim 21 wherein the modifier is configured to modify the frequencies and bandwidths indicative of the formants based on the speech frequency response.
-
23. The formant tracker of claim 13 wherein the formant synthesizer comprises:
a synthesizer modifying component, coupled to the modification component, configured to modify the formant synthesizer based on the modified formants.
-
24. A formant tracker, comprising:
-
a first frequency response generator configured to receive a speech signal and provide a speech frequency response at a first plurality of time instants based on the speech signal;
a formant calculation component configured to receive speech units associated with the speech signal and to provide continuous proposed formant frequencies and bandwidths at a second plurality of time instants corresponding to the speech units;
a second frequency response generator coupled to the formant calculation component and configured to provide a formant frequency response at the second plurality of time instants based on the proposed formant frequencies and bandwidths; and
a modifier component, coupled to the first and second frequency response generators, configured to compare specific proportional characteristics of the speech frequency response and the formant frequency response and to proportionally modify the proposed formant frequencies and bandwidths based on differences between the speech frequency response and the formant frequency response obtained in the comparison. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32)
a speech unit store storing the speech units associated with the predefined speech such that the speech units are predefined speech units.
-
-
26. The formant tracker of claim 24 and further comprising:
a speech recognizer component configured to receive the speech signal and provide the speech units associated with the speech signal to the formant calculation component.
-
27. The formant tracker of claim 24 wherein the modifier component is configured to compare a first time evolution of the speech frequency response with a second time evolution of the formant frequency response and to adjust the second time evolution to more closely match the first time evolution.
-
28. The formant tracker of claim 27 wherein the modifier component is configured to adjust the second plurality of time instants to more closely match the first plurality of time instants.
-
29. The formant tracker of claim 27 wherein the modifier component is further configured to compare the speech frequency response with the formant frequency response after the second time evolution has been adjusted and to modify the proposed frequencies and bandwidths based on the comparison.
-
30. The formant tracker of claim 29 wherein the modifier component is configured to modify the proposed frequencies and bandwidths by applying a warping function to the proposed frequencies and bandwidths, the warping function being based on the comparison of the speech frequency response and the formant frequency response.
-
31. The formant tracker of claim 30 wherein the modifier component is configured to modify the proposed frequencies and bandwidths by modifying the proposed formant frequencies and bandwidths, recalculating the formant frequency response based on the modified frequencies and bandwidths, and comparing the recalculated formant frequency response to the speech frequency response.
-
32. The formant tracker of claim 31 wherein the modifier component is further configured to compare the recalculated formant frequency response with the speech frequency response and determines whether further modification of the proposed frequencies and bandwidths is desirable.
Specification