Method and system for automatic text-independent grading of pronunciation for language instruction
First Claim
1. In an automatic speech processing system, a method for grading the pronunciation of a student speech sample, the method comprising:
- accepting said student speech sample which comprises a sequence of words spoken by a student speaker;
operating a set of trained speech models to compute at least one posterior probability from said speech sample, each of said posterior probabilities being a normalized probability, with respect to a set of models including competing models and the model corresponding to the speech sample, that a particular portion of said student speech sample corresponds to a particular known model given said particular portion of said speech sample; and
computing an evaluation score, herein referred to as the posterior-based evaluation score, of pronunciation quality for said student speech sample from said posterior probabilities.
0 Assignments
0 Petitions
Accused Products
Abstract
Pronunciation quality is automatically evaluated for an utterance of speech based on one or more pronunciation scores. One type of pronunciation score is based on duration of acoustic units. Examples of acoustic units include phones and syllables. Another type of pronunciation score is based on a posterior probability that a piece of input speech corresponds to a certain model such as an HMM, given the piece of input speech. Speech may be segmented into phones and syllables for evaluation with respect to the models. The utterance of speech may be an arbitrary utterance made up of a sequence of words which had not been encountered before. Pronunciation scores are converted into grades as would be assigned by human graders. Pronunciation quality may be evaluated in a client-server language instruction environment.
-
Citations
11 Claims
-
1. In an automatic speech processing system, a method for grading the pronunciation of a student speech sample, the method comprising:
-
accepting said student speech sample which comprises a sequence of words spoken by a student speaker;
operating a set of trained speech models to compute at least one posterior probability from said speech sample, each of said posterior probabilities being a normalized probability, with respect to a set of models including competing models and the model corresponding to the speech sample, that a particular portion of said student speech sample corresponds to a particular known model given said particular portion of said speech sample; and
computing an evaluation score, herein referred to as the posterior-based evaluation score, of pronunciation quality for said student speech sample from said posterior probabilities. - View Dependent Claims (2, 3, 4, 5, 6, 7)
said particular known model is a context-dependent model; and
individual models are context-dependent or context-independent models.
-
-
4. The method according to claim 2 further comprising:
-
mapping said posterior-based evaluation score to a grade as would be assigned by human listener; and
presenting said grade to said student speaker.
-
-
5. The method according to claim 2 wherein said student speech sample comprises an acoustic features sequence, the method further comprising the steps of:
-
computing a path through a set of trained hidden Markov models (HMMs) from among said trained speech models, said path being an allowable path through the HMMs that has maximum likelihood of generating said acoustic features sequence; and
identifying transitions between phones within said path, thereby defining phones.
-
-
6. The method according to claim 5 wherein the path computing step is performed using the Viterbi search technique.
-
7. The method according to claim 5 wherein said spoken sequence of words is unknown, and the path computing step is performed using a computerized speech recognition system that determines said spoken sequence of words.
-
8. A system for assessing pronunciation of a student speech sample, said student speech sample comprising a sequence of words spoken by a student speaker, the system comprising:
-
trained speech acoustic models of exemplary speech; and
an acoustic scorer configured to compute at least one posterior probability from said speech sample using said trained speech models, said acoustic scorer also configured to compute an evaluation score of pronunciation quality for said student sample from said posterior probabilities, each of said posterior probabilities being a normalized probability, with respect to a set of models including competing models and the model correspondinig to the speech sample, that a particular portion of said student speech sample corresponds to a particular known model given said particular portion of said speech sample.
-
-
9. A system for pronunciation training in a client/server environment wherein there exists a client process for presenting prompts to a student and for accepting student speech elicited by said prompts, the system comprising:
-
a server process for sending control information to said client process to specify a prompt to be presented to said student and for receiving a speech sample derived from said student speech elicited by said presented prompt; and
a pronunciation evaluator invocable by said server process for analyzing said student speech sample, wherein;
said pronunciation evaluator is established, using an acoustic model for computing a posterior probability-based evaluation score, of pronunciation quality for said student speech sample. - View Dependent Claims (10, 11)
-
Specification