Robust speech processing with affine transform replicated data
First Claim
1. A method for processing speech comprising the steps of:
- determining affine transform parameters from clean speech and development speech samples;
generating an extended data set from a training speech sample and said affine transform parameters;
training a plurality of classifiers with said extended data set to provide trained classifiers; and
classifying a testing speech sample with said trained classifier for forming classified output,wherein the affine transform has the form;
space="preserve" listing-type="equation">c.sub.k.sup.T'"'"' =Ac.sub.k.sup.T +bwherein said affine transform parameters are represented by A and b and A is a matrix representing said deviations of said noise and b represents said deviations of said channel;
CkT'"'"' represents said cepstrum coefficients of said development speech sample and CkT represents said cepstrum coefficients of said clean speech sample;
wherein said affine transform is solved by;
##EQU12## for j=1 . . . q and α
j is augmented column vector having first q entries from the jth row of matrix A and last entry is the jth element of vector b;
wherein said affine transform parameters correct for deviations of channel and noise in said training speech sample and said testing speech sample.
7 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a robust speech processing method and system which models channel and noise variations with affine transforms to reduce mismatched conditions between training and testing. The affine transform relating the training vectors Ck with the vectors for testing condition ck'"'"', is represented by the form:
c'"'"'.sub.k.sup.T =Ac.sub.k.sup.T +b
for k=1 to N in which A is a matrix of predicator coefficients representing noise distortions and vector b represents channel distortions. Alternatively, an affine invariant cepstrum is generated during testing and training for modeling speech to account for noise and channel effects. From the improved speech processing, improved speaker recognition with channel and noise variations is obtained.
73 Citations
7 Claims
-
1. A method for processing speech comprising the steps of:
-
determining affine transform parameters from clean speech and development speech samples; generating an extended data set from a training speech sample and said affine transform parameters; training a plurality of classifiers with said extended data set to provide trained classifiers; and classifying a testing speech sample with said trained classifier for forming classified output, wherein the affine transform has the form;
space="preserve" listing-type="equation">c.sub.k.sup.T'"'"' =Ac.sub.k.sup.T +bwherein said affine transform parameters are represented by A and b and A is a matrix representing said deviations of said noise and b represents said deviations of said channel;
CkT'"'"' represents said cepstrum coefficients of said development speech sample and CkT represents said cepstrum coefficients of said clean speech sample;wherein said affine transform is solved by;
##EQU12## for j=1 . . . q and α
j is augmented column vector having first q entries from the jth row of matrix A and last entry is the jth element of vector b;wherein said affine transform parameters correct for deviations of channel and noise in said training speech sample and said testing speech sample.
-
-
2. A method for processing speech comprising the steps of:
-
modeling a clean speech sample with a cepstrum C; modeling a plurality of training speech samples with cepstrum Ci for each of said training speech samples i=1, 2 . . . N; determining a plurality of training affine transform parameters from said cepstrum C and said cepstrum Ci ; storing said plurality of affine transform parameters in a database; training a plurality of classifiers with each classifier being trained by one of said cepstrum Ci ; storing said plurality of classifiers in said database; modeling a testing speech sample with a cepstrum Ctest ; determining testing affine transform parameters from said cepstrum Ctest and said cepstrum C; searching said affine transform parameters stored in said database for the closest match of said testing affine transform parameters and said testing transform parameters; and classifying said testing speech sample with said trained classifiers wherein the affine transform has the form;
space="preserve" listing-type="equation">c.sub.k.sup.T'"'"' =Ac.sub.k.sup.T +bwherein said affine transform parameters are represented by A and b and A is a matrix representing said deviations of said noise and b represents said deviations of said channel;
CkT'"'"' represents said cepstrum coefficients of said development speech sample and CkT represents said cepstrum coefficients of said clean speech sample;wherein said affine transform is solved by;
##EQU13## for j=1 . . . q and α
j is augmented column vector having first q entries from the jth row of matrix A and last entry is the jth element of vector b.
-
-
3. A method for processing speech comprising the steps of:
-
determining cepstral vectors ci from a speech sample; determining an affine invariant cepstrum from said cepstral vectors by determining a centroid co of said cepstral vectors ci ; and determining a variance σ
of said cepstral vectors ci,wherein said affine invariant cepstrum has the form ##EQU14## and modeling said speech sample with said affine invariant cepstrum for producing processed speech, wherein said affine invariant cepstrum corrects for deviation of channel and noise in said speech sample.
-
-
4. A method for speaker verification comprising the steps of:
-
modeling a transmitted training speech sample from a speaker using cepstral training vectors ci ; determining an affine invariant training cepstrum from said cepstral training vectors ci by; storing said affine invariant training cepstrum; modeling a transmitted testing speech sample from a speaker using cepstral training vectors ci ; determining an affine invariant testing cepstrum from cepstral testing vectors ci ; comparing said affine invariant testing cepstrum with said stored affine invariant training cepstrum, wherein a match of said affine invariant testing cepstrum with said affine invariant training cepstrum indicates a verified speaker and; wherein said affine invariant cepstrum is determined from the steps of determining a centroid co of said cepstral vectors ci ; and determining a variance σ
of said cepstral vectors ci,wherein said affine invariant cepstrum has the form ##EQU15##
-
-
5. A system for speaker verification comprising:
- means for modeling a transmitted training speech sample from a speaker using cepstral training vectors ci ;
means for storing said affine invariant training cepstrum; means for modeling a transmitted testing speech sample from a speaker using cepstral training vectors ci ; means for determining an affine invariant testing cepstrum from cepstral testing vectors ci ; means for comparing said affine invariant testing cepstrum from said stored affine invariant training cepstrum, wherein a match of said affine invariant testing cepstrum with said affine invariant training cepstrum indicates a verified speaker and wherein said affine invariant cepstrum is determined from the steps of; determining a centroid co of said cepstral vectors ci ; and determining a variance σ
of said cepstral vectors ci,wherein said affine invariant cepstrum has the form ##EQU16##
- means for modeling a transmitted training speech sample from a speaker using cepstral training vectors ci ;
-
6. A method for speech recognition comprising:
- modeling a transmitted training speech sample from a speaker using cepstral training vectors ci ;
determining an affine invariant training cepstrum from said cepstral training vectors ci ; storing said affine invariant training cepstrum; modeling a transmitted testing speech sample from a speaker using cepstral training vectors ci ; determining an affine invariant testing cepstrum from cepstral testing vectors ci ; comparing said affine invariant testing cepstrum with said stored affine invariant training cepstrum, wherein a match of said affine invariant testing cepstrum with said affine invariant training cepstrum indicates a verified speaker; wherein said affine invariant cepstrum is determined from the steps of; determining a centroid co of said cepstral vectors ci ; and determining a variance σ
of said cepstral vectors ci,wherein said affine invariant cepstrum has the form ##EQU17##
- modeling a transmitted training speech sample from a speaker using cepstral training vectors ci ;
-
7. A system for speech recognition comprising:
- means for modeling a transmitted training speech sample of a pattern using cepstral training vectors ci ;
means for determining an affine invariant training cepstrum from said cepstral training vectors ci ; means for storing said affine invariant training cepstrum; means for modeling a transmitted testing speech sample of said pattern using cepstral training vectors ci ; means for determining an affine invariant testing cepstrum from cepstral testing vectors ci ; means for comparing said affine invariant testing cepstrum with said stored affine invariant training cepstrum, wherein a match of said affine invariant testing cepstrum with said affine invariant training cepstrum indicates a recognized speech pattern wherein said affine invariant cepstrum is determined from the steps of; determining a centroid co of said cepstral vectors ci ; and determining a variance σ
of said cepstral vectors ci,wherein said affine invariant cepstrum has the form ##EQU18##
- means for modeling a transmitted training speech sample of a pattern using cepstral training vectors ci ;
Specification