Method and system for automatic text-independent grading of pronunciation for language instruction

US 6,226,611 B1
Filed: 01/26/2000
Issued: 05/01/2001
Est. Priority Date: 10/02/1996
Status: Expired due to Term

First Claim

Patent Images

1. In an automatic speech processing system, a method for grading the pronunciation of a student speech sample, the method comprising:

accepting said student speech sample which comprises a sequence of words spoken by a student speaker;

operating a set of trained speech models to compute at least one posterior probability from said speech sample, each of said posterior probabilities being a normalized probability, with respect to a set of models including competing models and the model corresponding to the speech sample, that a particular portion of said student speech sample corresponds to a particular known model given said particular portion of said speech sample; and

computing an evaluation score, herein referred to as the posterior-based evaluation score, of pronunciation quality for said student speech sample from said posterior probabilities.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Pronunciation quality is automatically evaluated for an utterance of speech based on one or more pronunciation scores. One type of pronunciation score is based on duration of acoustic units. Examples of acoustic units include phones and syllables. Another type of pronunciation score is based on a posterior probability that a piece of input speech corresponds to a certain model such as an HMM, given the piece of input speech. Speech may be segmented into phones and syllables for evaluation with respect to the models. The utterance of speech may be an arbitrary utterance made up of a sequence of words which had not been encountered before. Pronunciation scores are converted into grades as would be assigned by human graders. Pronunciation quality may be evaluated in a client-server language instruction environment.

Citations

11 Claims

1. In an automatic speech processing system, a method for grading the pronunciation of a student speech sample, the method comprising:
- accepting said student speech sample which comprises a sequence of words spoken by a student speaker;
  
  operating a set of trained speech models to compute at least one posterior probability from said speech sample, each of said posterior probabilities being a normalized probability, with respect to a set of models including competing models and the model corresponding to the speech sample, that a particular portion of said student speech sample corresponds to a particular known model given said particular portion of said speech sample; and
  
  computing an evaluation score, herein referred to as the posterior-based evaluation score, of pronunciation quality for said student speech sample from said posterior probabilities.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1 wherein each of said posterior probabilities is derived from a model likelihood by dividing the likelihood that said particular known model generated said particular portion of said student speech sample by the maximum one of the likelihoods that individual alternative models had generated said particular portion of said speech sample.
  - 3. The method according to claim 2 wherein:
4. The method according to claim 2 further comprising:
- mapping said posterior-based evaluation score to a grade as would be assigned by human listener; and
  
  presenting said grade to said student speaker.
5. The method according to claim 2 wherein said student speech sample comprises an acoustic features sequence, the method further comprising the steps of:
- computing a path through a set of trained hidden Markov models (HMMs) from among said trained speech models, said path being an allowable path through the HMMs that has maximum likelihood of generating said acoustic features sequence; and
  
  identifying transitions between phones within said path, thereby defining phones.
6. The method according to claim 5 wherein the path computing step is performed using the Viterbi search technique.
7. The method according to claim 5 wherein said spoken sequence of words is unknown, and the path computing step is performed using a computerized speech recognition system that determines said spoken sequence of words.

8. A system for assessing pronunciation of a student speech sample, said student speech sample comprising a sequence of words spoken by a student speaker, the system comprising:
- trained speech acoustic models of exemplary speech; and
  
  an acoustic scorer configured to compute at least one posterior probability from said speech sample using said trained speech models, said acoustic scorer also configured to compute an evaluation score of pronunciation quality for said student sample from said posterior probabilities, each of said posterior probabilities being a normalized probability, with respect to a set of models including competing models and the model correspondinig to the speech sample, that a particular portion of said student speech sample corresponds to a particular known model given said particular portion of said speech sample.

9. A system for pronunciation training in a client/server environment wherein there exists a client process for presenting prompts to a student and for accepting student speech elicited by said prompts, the system comprising:
- a server process for sending control information to said client process to specify a prompt to be presented to said student and for receiving a speech sample derived from said student speech elicited by said presented prompt; and
  
  a pronunciation evaluator invocable by said server process for analyzing said student speech sample, wherein;
  
  said pronunciation evaluator is established, using an acoustic model for computing a posterior probability-based evaluation score, of pronunciation quality for said student speech sample.
- View Dependent Claims (10, 11)
- - 10. The system according to claim 9 wherein said server process receives said speech sample over a speech channel that is separate from a communication channel through which said server process and said client process communicate.
  - 11. The system according to claim 9 wherein said client process and said server process are located on two separate computer processors and communicate via a network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Neumeyer, Leonardo, Franco, Horacio, Weintraub, Mitchel, Price, Patti, Digalakis, Vassilios
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/491,374
Time in Patent Office

461 Days
Field of Search

704/246, 704/249, 704/254, 704/276, 704/200, 434/185
US Class Current

704/246
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G10L 15/04   Segmentation; Word boundary...

G10L 15/26   Speech to text systems G10L...

H04L 67/01   Protocols

Method and system for automatic text-independent grading of pronunciation for language instruction

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for automatic text-independent grading of pronunciation for language instruction

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links