Tone features for speech recognition

US 6,829,578 B1
Filed: 07/09/2001
Issued: 12/07/2004
Est. Priority Date: 11/11/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system for recognizing a time-sequential input signal representing speech spoken in a tonal language;

the system including;

an input for receiving the input signal;

a speech analysis subsystem for representing a segment of the input signal as an observation feature vector; and

a unit matching subsystem for matching the observation feature vector against an inventory of trained speech recognition units, each unit being represented by at least one reference feature vector;

wherein the feature vector includes a component derived from an estimated degree of voicing of the speech segment represented by the feature vector and wherein unvoiced segments of speech are represented by a pseudo feature vector.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Robust acoustic tone features are achieved first by the introduction of on-line, look-ahead trace back of the fundamental frequency (F0) contour with adaptive pruning, this fundamental frequency serves as the signal preprocessing front-end. The F0 contour is subsequently decomposed into lexical tone effect, phrase intonation effect, and random effect by means of time-variant, weighted moving average (MA) filter in conjunction with weighted (placing more emphasis on vowels) least squares of the F0 contour. The intonation effect is removed by subtraction of the F0 contour under superposition assumption. The acoustic tone features are defined as two parts. First, is the coefficients of the second order weighted regression of the de-intonation of the F0 contour over neighbouring frames. The second part deals with the degree of the periodicity of the signal, which are the coefficients of the second order regression of the auto-correlation. These weights of the second order weighted regression of the de-intonation of the F0 contour are designed to emphasize/de-emphasize the voiced/unvoiced segments of the pitch contour in order to preserve the voiced pitch contour for the semi-voiced consonants.

61 Citations

View as Search Results

19 Claims

1. A speech recognition system for recognizing a time-sequential input signal representing speech spoken in a tonal language;
- the system including;
  
  an input for receiving the input signal;
  
  a speech analysis subsystem for representing a segment of the input signal as an observation feature vector; and
  
  a unit matching subsystem for matching the observation feature vector against an inventory of trained speech recognition units, each unit being represented by at least one reference feature vector;
  
  wherein the feature vector includes a component derived from an estimated degree of voicing of the speech segment represented by the feature vector and wherein unvoiced segments of speech are represented by a pseudo feature vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A speech recognition system as claimed in claim 1, wherein the derived component represents the estimated degree of voicing of the speech segment.
  - 3. A speech recognition system as claimed in claim 1, wherein the derived component represents a derivative of the estimated degree of voicing of the speech segment.
  - 4. A speech recognition system as claimed in claim 1, wherein the estimated degree of voicing is smoothed.
  - 5. A speech recognition system as claimed in claim 1, wherein the degree of voicing is a measure of a short-time auto-correlation of an estimated pitch contour.
  - 6. A speech recognition system as claimed in claim 5, wherein the measure is formed by the regression coefficients of the auto-correlation contour.
  - 7. A speech recognition system as claimed in claim 5, wherein the estimated pitch is obtained by removing a phrase intonation effect from an estimated pitch contour representing the speech segment.
  - 8. A speech recognition system as claimed in claim 7, wherein the phrase intonation effect is represented by a weighted moving average of the estimated pitch contour.
  - 9. A speech recognition system as claimed in claim 8, wherein a weight of the weighted moving average represents the degree of voicing in the segment.
  - 10. A speech recognition system as claimed in claim 1, wherein the feature vector includes a component representing a derivative of an estimated pitch of the speech segment.
  - 11. A speech recognition system as claimed in claim 1, wherein a segment is considered unvoiced if a sum of regression weights of an estimated pitch contour within a regression window.
  - 12. A speech recognition system as claimed in claim 1, wherein the pseudo feature vector includes pseudo features generated according to a least squares criterion.

13. A method for recognizing a time-sequential input signal representing speech spoken in a tonal language;
- the method comprising the steps of;
  
  receiving the input signal;
  
  representing a segment of the input signal as an observation feature vector; and
  
  matching the observation feature vector against an inventory of trained speech recognition units, each unit being represented by at least one reference feature vector;
  
  wherein the feature vector includes a component derived from an estimated degree of voicing of the speech segment represented by the feature vector and wherein unvoiced segments of speech are represented by a pseudo feature vector.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. A method as claimed in claim 13, wherein the degree of voicing is a measure of a short-time auto-correlation of an estimated pitch contour.
  - 15. A method as claimed in claim 14, wherein the estimated pitch is obtained by removing a phrase intonation effect from an estimated pitch contour representing the speech segment.
  - 16. A method as claimed in claim 15, wherein the phrase intonation effect is represented by a weighted moving average of the estimated pitch contour.
  - 17. A method as claimed in claim 16, wherein a weight of the weighted moving average represents the degree of voicing in the segment.
  - 18. A method as claimed in claim 13, wherein a segment is considered unvoiced if a sum of regression weights of an estimated pitch contour within a regression window.
  - 19. A method as claimed in claim 13, wherein the pseudo feature vector includes pseudo features generated according to a least squares criterion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Seide, Frank Torsten Bernd, Huang, Chang-Han
Primary Examiner(s)
CHAWAN, VIJAY B

Application Number

US09/869,942
Time in Patent Office

1,247 Days
Field of Search

704/205, 704/207, 704/208, 704/211, 704/214, 704/216, 704/217
US Class Current

704/211
CPC Class Codes

G10L 15/1807   using prosody or stress

G10L 2025/935   Mixed voiced class; Transit...

G10L 25/15   the extracted parameters be...

Tone features for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

61 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Tone features for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links