Tone features for speech recognition
First Claim
1. A speech recognition system for recognizing a time-sequential input signal representing speech spoken in a tonal language;
- the system including;
an input for receiving the input signal;
a speech analysis subsystem for representing a segment of the input signal as an observation feature vector; and
a unit matching subsystem for matching the observation feature vector against an inventory of trained speech recognition units, each unit being represented by at least one reference feature vector;
wherein the feature vector includes a component derived from an estimated degree of voicing of the speech segment represented by the feature vector and wherein unvoiced segments of speech are represented by a pseudo feature vector.
2 Assignments
0 Petitions
Accused Products
Abstract
Robust acoustic tone features are achieved first by the introduction of on-line, look-ahead trace back of the fundamental frequency (F0) contour with adaptive pruning, this fundamental frequency serves as the signal preprocessing front-end. The F0 contour is subsequently decomposed into lexical tone effect, phrase intonation effect, and random effect by means of time-variant, weighted moving average (MA) filter in conjunction with weighted (placing more emphasis on vowels) least squares of the F0 contour. The intonation effect is removed by subtraction of the F0 contour under superposition assumption. The acoustic tone features are defined as two parts. First, is the coefficients of the second order weighted regression of the de-intonation of the F0 contour over neighbouring frames. The second part deals with the degree of the periodicity of the signal, which are the coefficients of the second order regression of the auto-correlation. These weights of the second order weighted regression of the de-intonation of the F0 contour are designed to emphasize/de-emphasize the voiced/unvoiced segments of the pitch contour in order to preserve the voiced pitch contour for the semi-voiced consonants.
61 Citations
19 Claims
-
1. A speech recognition system for recognizing a time-sequential input signal representing speech spoken in a tonal language;
- the system including;
an input for receiving the input signal;
a speech analysis subsystem for representing a segment of the input signal as an observation feature vector; and
a unit matching subsystem for matching the observation feature vector against an inventory of trained speech recognition units, each unit being represented by at least one reference feature vector;
wherein the feature vector includes a component derived from an estimated degree of voicing of the speech segment represented by the feature vector and wherein unvoiced segments of speech are represented by a pseudo feature vector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- the system including;
-
13. A method for recognizing a time-sequential input signal representing speech spoken in a tonal language;
- the method comprising the steps of;
receiving the input signal;
representing a segment of the input signal as an observation feature vector; and
matching the observation feature vector against an inventory of trained speech recognition units, each unit being represented by at least one reference feature vector;
wherein the feature vector includes a component derived from an estimated degree of voicing of the speech segment represented by the feature vector and wherein unvoiced segments of speech are represented by a pseudo feature vector. - View Dependent Claims (14, 15, 16, 17, 18, 19)
- the method comprising the steps of;
Specification