Method and apparatus for using formant models in speech systems
First Claim
1. A method of identifying a sequence of formant values for formants in a speech signal, the method comprising:
- parsing the speech signal into a sequence of segments;
associating each segment with a formant model state;
identifying a set of candidate formants for each segment;
grouping the candidate formants in each segment into at least one group, each group in each segment having the same number of candidate formants;
determining a separate probability for each possible sequence of groups across the segments of the speech signal; and
selecting the sequence of groups with the highest probability.
2 Assignments
0 Petitions
Accused Products
Abstract
A model is provided for formants found in human speech. Under one aspect of the invention, the model is used in formant tracking by providing probabilities that describe the likelihood that a candidate formant is actually a formant in the speech signal. Other aspects of the invention use this formant tracking to improve the model by regenerating the model based on the formants detected by the formant tracker. Still other aspects of the invention use the formant tracking to compress a speech signal by removing some of the formants from the speech signal. A further aspect of the invention uses the formant model to synthesize speech. Under this aspect of the invention, the formant model is used to identify a most likely formant track for the synthesized speech. Based on this track, a series of resonators are used to introduce the formants into the speech signal.
-
Citations
25 Claims
-
1. A method of identifying a sequence of formant values for formants in a speech signal, the method comprising:
-
parsing the speech signal into a sequence of segments;
associating each segment with a formant model state;
identifying a set of candidate formants for each segment;
grouping the candidate formants in each segment into at least one group, each group in each segment having the same number of candidate formants;
determining a separate probability for each possible sequence of groups across the segments of the speech signal; and
selecting the sequence of groups with the highest probability. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
accessing sets of formant models where one set of formant models is designated for each state;
determining a probability for each candidate formant in each group based on at least one formant model from the set of formant models designated for the group, each formant model being used to determine the probability of only one candidate formant in a group;
combining the probabilities of each candidate formant in the sequence of groups to produce the probability for the sequence of groups.
-
-
3. The method of claim 2 wherein accessing sets of formant models comprises accessing a frequency model and a bandwidth model for each candidate formant.
-
4. The method of claim 3 wherein accessing sets of formant models further mprises accessing a change-in-frequency model and a change-in-bandwidth model for each candidate formant, the change-in-frequency model describing changes in a formant'"'"'s frequency between states and the change-in-bandwidth model describing changes in a formant'"'"'s bandwidth between states.
-
5. The method of claim 4 wherein determining a probability for each candidate formant in each group comprises determining a change in frequency between a candidate formant in a group in a current segment and a candidate formant in a group in a neighboring segment.
-
6. The method of claim 4 wherein determining a probability for each candidate formant in each group comprises determining a change in bandwidth between a candidate formant in a group in a current segment and a candidate formant in a group in a neighboring segment.
-
7. The method of claim 1 further comprising replacing the selected sequence of groups with an unobserved sequence of groups through steps comprising:
-
generating a probability function that describes the probability of unobserved group sequences and that is based on the sets of formant models and the selected sequence of groups; and
selecting an unobserved sequence of groups that maximizes the probability function to replace the selected sequence of groups.
-
-
8. The method of claim 7 wherein selecting the unobserved sequence of groups that maximizes the probability function comprises:
-
determining partial derivatives of the probability function;
setting the partial derivatives equal to zero to form a set of equations; and
simultaneously solving the equations in the set of equations.
-
-
9. The method of claim 1 wherein the method forms part of a method for revising each formant model in a set of formant models for each state, the method of revising a formant model for a state further comprising:
-
collecting the formants that are associated with the formant model and that were selected for each occurrence of the state in the speech signal;
generating a Gaussian distribution from the collected formants, the Gaussian distribution forming a new formant model; and
replacing the existing formant model with the new formant model.
-
-
10. The method of claim 9 wherein collecting the formants comprises collecting a first formant that was selected for each occurrence of the state.
-
11. The method of claim 9 wherein generating a Gaussian distribution comprises generating a Gaussian distribution from the frequencies of the collected formants and wherein the Gaussian distribution forms a new frequency model for a formant.
-
12. The method of claim 9 wherein generating a Gaussian distribution comprises generating a Gaussian distribution from the bandwidths of the collected formants and wherein the Gaussian distribution forms a new bandwidth model for a formant.
-
13. The method of claim 1 wherein the method forms part of a method for compressing speech, the method for compressing speech further comprising:
-
using the selected sequence of groups to adjust a set of formant filters to match the formants of the selected sequence of groups;
passing the sequence of segments through the set of formant filters to remove the formants from the segments thereby forming a residual signal; and
compressing the residual signal.
-
-
14. The method of clam 13 wherein using the selected sequence of groups to adjust a set of formant filters comprises adjusting a filter so that it removes a band of frequencies equal to the bandwidth of a formant of the selected sequence of groups and centered on a frequency of a formant of the selected sequence of groups.
-
15. A computer-readable medium having computer executable components for performing steps for identifying formants, the steps comprising:
-
receiving an input speech signal;
dividing the input speech signal into a set of segments; and
identifying at least one formant in each segment based on a formant model for a model state associated with the segment, the formant model comprising a change-in-frequency model. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
identifying a set of candidate formants for each segment;
grouping the candidate formants in each segment to form formant groups;
determining the probabilities of sequences of formant groups across multiple segments; and
selecting a most probable sequence of formant groups to identify a formant in a segment.
-
-
17. The computer-readable medium of claim 16 wherein determining the probability of a sequence of formant groups comprises:
-
determining the probability of each candidate formant in each group using at least one aspect of the candidate formant and a formant model based on that one aspect;
combining the probabilities of each formant to produce a combined probability for the entire sequence of groups.
-
-
18. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the frequency of the candidate formant and a formant model based on the frequency of a formant.
-
19. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the bandwidth of the candidate formant and a formant model based on the bandwidth of a formant.
-
20. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the change in frequency of the candidate formant between a current segment and a neighboring segment and a formant model based on the change in frequency of a formant.
-
21. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the change in bandwidth of the candidate formant between the current segment and a neighboring segment and using a formant model based on the change in bandwidth of a formant.
-
22. The computer-readable medium of claim 16 having computer-executable components for performing further steps for identifying actual formants, the steps comprising:
-
generating a probability function that describes the probability of a sequence of actual formants, the probability function based in part on the selected most probable sequence of formant groups; and
identifying a sequence of actual formants that maximizes the probability function.
-
-
23. The computer-readable medium of claim 22 wherein identifying a sequence of actual formants that maximizes the probability function comprises:
-
determining a set of partial derivatives of the probability function;
setting each partial derivative equal to zero to form a set of equations; and
solving each equation in the set of equations to identify the sequence of actual formants.
-
-
24. The computer-readable medium of claim 16 having computer-executable components for performing further steps comprising:
-
combining the formant groups that were selected for each occurrence of a state to produce a new model for each formant in the state; and
replacing the formant model for the state with the new model.
-
-
25. The computer-readable medium of claim 15 having computer-executable components for performing further steps comprising:
-
adjusting a filter so that it removes frequencies associated with an identified formant for a segment; and
passing the segment through the filter to produce a residual signal.
-
Specification