Voice messaging system with unified pitch and voice tracking

US 4,696,038 A
Filed: 04/13/1983
Issued: 09/22/1987
Est. Priority Date: 04/13/1983
Status: Expired due to Term

First Claim

Patent Images

1. In a voice messaging system for receiving a human speech signal and reconstituting said human speech signal at a receiver which is spatially or temporally remote, the combination comprising:

LPC analysis means for analyzing an analog speech signal provided as an input thereto in accordance with an LPC (Linear Predictive Coding) model, said LPC analysis means providing LPC parameters and a residual signal organized in a sequence of speech data frames and the respective residual signals corresponding thereto as an output representative of the analog speech signal;

pitch extraction means operably associated with said LPC analysis means for determining a plurality of pitch candidates for each of the speech data frames in said sequence;

optimization means operably associated with said LPC analysis means and said pitch extraction means for performing dynamic programming with respect both to said plurality of pitch candidates for each speech data frame and also to a voiced/unvoiced decision for each speech data frame to determine both an optimal pitch and an optimal voicing decision for each speech data frame in the context of sequence of speech data frames, said optimization means defining a transition error between each pitch candidate of the current frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate in the current frame which is equal to the transition error between said pitch candidate of said current frame plus the cumulative error of an optimally identified pitch cnadidate in the preceding frame, said optimally identified pitch candidate in the preceding frame being chosen from among the pitch candidates for said preceding frame such that the cumulative error of said corresponding pitch candidate in said current frame is at a minimum; and

means operably associated with said LPC analysis means, said pitch extraction means and said optimization means for encoding said LPC parameters and said optimal pitch and optimal voicing decision for each speech data frame.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This voice messaging system provides an LPC analyzer in combination with a pitch extractor wherein LPC parameters and a residual signal organized in a sequence of speech data frames are provided by the LPC analyzer as an output representative of an analog speech signal. The pitch extractor is operably associated with the LPC analyzer and produces a plurality of pitch candidates for each of the speech data frames in the sequence thereof. Dynamic programming is performed on the plurality of pitch candidates for each speech data frame and also with respect to a voiced/unvoiced decision of the speech data for each frame by tracking both pitch and voicing from frame to frame to provide an optimal pitch value and also an optimal voicing decision. During dynamic programming, a cumulative penalty for a sequence of frame pitch/voicing decisions is accumulated by defining a transition error between each pitch candidate of a current speech data frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate of the current frame equal to the transition error between the pitch candidate of the current frame plus the cumulative error of an optimally identified pitch candidate in the preceding frame to locate the track providing optimal pitch and voicing decisions based upon the lowest cumulative penalty. An encoder then encodes the LPC parameters as generated by the LPC analyzer and the optimal pitch and voicing decisions for each speech data frame for subsequent use in providing an audible synthesized speech output substantially identical to the original speech input.

Citations

10 Claims

1. In a voice messaging system for receiving a human speech signal and reconstituting said human speech signal at a receiver which is spatially or temporally remote, the combination comprising:
- LPC analysis means for analyzing an analog speech signal provided as an input thereto in accordance with an LPC (Linear Predictive Coding) model, said LPC analysis means providing LPC parameters and a residual signal organized in a sequence of speech data frames and the respective residual signals corresponding thereto as an output representative of the analog speech signal;
  
  pitch extraction means operably associated with said LPC analysis means for determining a plurality of pitch candidates for each of the speech data frames in said sequence;
  
  optimization means operably associated with said LPC analysis means and said pitch extraction means for performing dynamic programming with respect both to said plurality of pitch candidates for each speech data frame and also to a voiced/unvoiced decision for each speech data frame to determine both an optimal pitch and an optimal voicing decision for each speech data frame in the context of sequence of speech data frames, said optimization means defining a transition error between each pitch candidate of the current frame and each pitch candidate of the preceding frame, and defining a cumulative error for each pitch candidate in the current frame which is equal to the transition error between said pitch candidate of said current frame plus the cumulative error of an optimally identified pitch cnadidate in the preceding frame, said optimally identified pitch candidate in the preceding frame being chosen from among the pitch candidates for said preceding frame such that the cumulative error of said corresponding pitch candidate in said current frame is at a minimum; and
  
  means operably associated with said LPC analysis means, said pitch extraction means and said optimization means for encoding said LPC parameters and said optimal pitch and optimal voicing decision for each speech data frame.
- View Dependent Claims (3, 4, 5, 6)
- - 3. The system of claim 1, wherein said transition error includes a pitch deviation error, said pitch deviation error corresponding to the difference in pitch between said pitch candidate in said current frame and said corresponding pitch candidate in said previous frame if both said frames are voiced.
  - 4. The system of claim 3, wherein said pitch deviation error is set at a constant if at least one of said frames is unvoiced.
  - 5. The system of claim 1, wherein said transition error also includes a voicing transition error component, said voicing transition error component being defined to be a small predetermined value when said current frame and said previous frame are both identically voiced or both identically unvoiced, and otherwise being defined to be a decreasing function of the spectral difference between said current frame and said previous frame.
  - 6. The system of claim 1, wherein said transition error further comprises a voicing state error, said voicing state error corresponding monotonically to the degree to which said speech data within said current frame is correlated at the period of said pitch candidate.

2. A method for determining the pitch and voicing of human speech comprising the steps of:
- analyzing a speech signal input in accordance with an LPC (Linear Predictive Coding) model to provide LPC parameters and a residual signal organized into a sequence of speech data frames and the respective residual signals corresponding thereto;
  
  determining a plurality of pitch candidates for each of the speech data frames in said sequence;
  
  performing dynamic programming with respect both to said plurality of pitch candidates for each speech data frame and also to a voiced/unvoiced decision for each speech data frame bydefining a transition error between each pitch candidate of the current frame and each pitch candidate of the preceding frame,defining a cumulative error for each pitch candidate of the current frame equal to the transition error between said pitch candidate of said current frame plus the cumulative error of an optimally identified pitch candidate in the preceding frame, andchoosing said optimally identified pitch candidate in the preceding frame such that the cumulative error of said corresponding pitch candidate in said current frame is at a minimum; and
  
  determining both an optimal pitch and an optimal voicing decision for each speech data frame in the context of said sequence of speech data frames in response to the performance of said dynamic programming.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The method of claim 2, wherein said transition error is defined to include a pitch deviation error, said pitch deviation error corresponding to the difference in pitch between said pitch candidate in said current frame and said corresponding pitch candidate in said previous frame when both said frames are voiced.
  - 8. The method of claim 7, further including setting said pitch deviation error at a constant if one of said frames is unvoiced.
  - 9. The method of claim 2, wherein said transition error is defined to include a voicing transition error component, said voicing transition error component being a small predetermined value when said current frame and said previous frame are both identically voiced or both identically unvoiced, and otherwise being a decreasing function of the spectral difference between said current frame and said previous frame.
  - 10. The method of claim 2, wherein said transition error is further comprise a voicing state error, said voicing state error corresponding monotonically to the degree to which said speech data within said current frame is correlated at the period of said pitch candidate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Doddington, George R., Secrest, Bruce G.
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/484,718
Time in Patent Office

1,623 Days
Field of Search

381/36, 381/37, 381/38, 381/39, 381/40, 381/49
US Class Current

704/219
CPC Class Codes

G10L 19/06 Determination or coding of ...

G10L 25/93 Discriminating between voic...

Voice messaging system with unified pitch and voice tracking

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Voice messaging system with unified pitch and voice tracking

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links