Method and apparatus for using formant models in speech systems

US 6,505,152 B1
Filed: 09/03/1999
Issued: 01/07/2003
Est. Priority Date: 09/03/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of identifying a sequence of formant values for formants in a speech signal, the method comprising:

parsing the speech signal into a sequence of segments;

associating each segment with a formant model state;

identifying a set of candidate formants for each segment;

grouping the candidate formants in each segment into at least one group, each group in each segment having the same number of candidate formants;

determining a separate probability for each possible sequence of groups across the segments of the speech signal; and

selecting the sequence of groups with the highest probability.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A model is provided for formants found in human speech. Under one aspect of the invention, the model is used in formant tracking by providing probabilities that describe the likelihood that a candidate formant is actually a formant in the speech signal. Other aspects of the invention use this formant tracking to improve the model by regenerating the model based on the formants detected by the formant tracker. Still other aspects of the invention use the formant tracking to compress a speech signal by removing some of the formants from the speech signal. A further aspect of the invention uses the formant model to synthesize speech. Under this aspect of the invention, the formant model is used to identify a most likely formant track for the synthesized speech. Based on this track, a series of resonators are used to introduce the formants into the speech signal.

Citations

25 Claims

1. A method of identifying a sequence of formant values for formants in a speech signal, the method comprising:
- parsing the speech signal into a sequence of segments;
  
  associating each segment with a formant model state;
  
  identifying a set of candidate formants for each segment;
  
  grouping the candidate formants in each segment into at least one group, each group in each segment having the same number of candidate formants;
  
  determining a separate probability for each possible sequence of groups across the segments of the speech signal; and
  
  selecting the sequence of groups with the highest probability.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 wherein determining a probability for a sequence of groups comprises:
3. The method of claim 2 wherein accessing sets of formant models comprises accessing a frequency model and a bandwidth model for each candidate formant.
4. The method of claim 3 wherein accessing sets of formant models further mprises accessing a change-in-frequency model and a change-in-bandwidth model for each candidate formant, the change-in-frequency model describing changes in a formant'"'"'s frequency between states and the change-in-bandwidth model describing changes in a formant'"'"'s bandwidth between states.
5. The method of claim 4 wherein determining a probability for each candidate formant in each group comprises determining a change in frequency between a candidate formant in a group in a current segment and a candidate formant in a group in a neighboring segment.
6. The method of claim 4 wherein determining a probability for each candidate formant in each group comprises determining a change in bandwidth between a candidate formant in a group in a current segment and a candidate formant in a group in a neighboring segment.
7. The method of claim 1 further comprising replacing the selected sequence of groups with an unobserved sequence of groups through steps comprising:
- generating a probability function that describes the probability of unobserved group sequences and that is based on the sets of formant models and the selected sequence of groups; and
  
  selecting an unobserved sequence of groups that maximizes the probability function to replace the selected sequence of groups.
8. The method of claim 7 wherein selecting the unobserved sequence of groups that maximizes the probability function comprises:
- determining partial derivatives of the probability function;
  
  setting the partial derivatives equal to zero to form a set of equations; and
  
  simultaneously solving the equations in the set of equations.
9. The method of claim 1 wherein the method forms part of a method for revising each formant model in a set of formant models for each state, the method of revising a formant model for a state further comprising:
- collecting the formants that are associated with the formant model and that were selected for each occurrence of the state in the speech signal;
  
  generating a Gaussian distribution from the collected formants, the Gaussian distribution forming a new formant model; and
  
  replacing the existing formant model with the new formant model.
10. The method of claim 9 wherein collecting the formants comprises collecting a first formant that was selected for each occurrence of the state.
11. The method of claim 9 wherein generating a Gaussian distribution comprises generating a Gaussian distribution from the frequencies of the collected formants and wherein the Gaussian distribution forms a new frequency model for a formant.
12. The method of claim 9 wherein generating a Gaussian distribution comprises generating a Gaussian distribution from the bandwidths of the collected formants and wherein the Gaussian distribution forms a new bandwidth model for a formant.
13. The method of claim 1 wherein the method forms part of a method for compressing speech, the method for compressing speech further comprising:
- using the selected sequence of groups to adjust a set of formant filters to match the formants of the selected sequence of groups;
  
  passing the sequence of segments through the set of formant filters to remove the formants from the segments thereby forming a residual signal; and
  
  compressing the residual signal.

14. The method of clam 13 wherein using the selected sequence of groups to adjust a set of formant filters comprises adjusting a filter so that it removes a band of frequencies equal to the bandwidth of a formant of the selected sequence of groups and centered on a frequency of a formant of the selected sequence of groups.

15. A computer-readable medium having computer executable components for performing steps for identifying formants, the steps comprising:
- receiving an input speech signal;
  
  dividing the input speech signal into a set of segments; and
  
  identifying at least one formant in each segment based on a formant model for a model state associated with the segment, the formant model comprising a change-in-frequency model.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 16. The computer-readable medium of claim 15 wherein identifying at least one formant in each segment comprises:
17. The computer-readable medium of claim 16 wherein determining the probability of a sequence of formant groups comprises:
- determining the probability of each candidate formant in each group using at least one aspect of the candidate formant and a formant model based on that one aspect;
  
  combining the probabilities of each formant to produce a combined probability for the entire sequence of groups.
18. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the frequency of the candidate formant and a formant model based on the frequency of a formant.
19. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the bandwidth of the candidate formant and a formant model based on the bandwidth of a formant.
20. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the change in frequency of the candidate formant between a current segment and a neighboring segment and a formant model based on the change in frequency of a formant.
21. The computer-readable medium of claim 17 wherein determining the probability of each formant comprises using the change in bandwidth of the candidate formant between the current segment and a neighboring segment and using a formant model based on the change in bandwidth of a formant.
22. The computer-readable medium of claim 16 having computer-executable components for performing further steps for identifying actual formants, the steps comprising:
- generating a probability function that describes the probability of a sequence of actual formants, the probability function based in part on the selected most probable sequence of formant groups; and
  
  identifying a sequence of actual formants that maximizes the probability function.
23. The computer-readable medium of claim 22 wherein identifying a sequence of actual formants that maximizes the probability function comprises:
- determining a set of partial derivatives of the probability function;
  
  setting each partial derivative equal to zero to form a set of equations; and
  
  solving each equation in the set of equations to identify the sequence of actual formants.
24. The computer-readable medium of claim 16 having computer-executable components for performing further steps comprising:
- combining the formant groups that were selected for each occurrence of a state to produce a new model for each formant in the state; and
  
  replacing the formant model for the state with the new model.
25. The computer-readable medium of claim 15 having computer-executable components for performing further steps comprising:
- adjusting a filter so that it removes frequencies associated with an identified formant for a segment; and
  
  passing the segment through the filter to produce a residual signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Acero, Alejandro
Primary Examiner(s)
Dorvill, Richemond
Assistant Examiner(s)
NOLAN, DANIEL A

Application Number

US09/389,898
Time in Patent Office

1,222 Days
Field of Search

704/231, 704/229, 704/206, 704/259, 704/270, 704/200.1, 704/201, 704/209
US Class Current

704/209
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 25/15 the extracted parameters be...

Method and apparatus for using formant models in speech systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for using formant models in speech systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links