Systems and Methods for Assessment of Non-Native Spontaneous Speech

US 20100145698A1
Filed: 12/01/2009
Published: 06/10/2010
Est. Priority Date: 12/01/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of assessing spontaneous speech pronunciation, comprising:

performing speech recognition on digitized speech using a non-native acoustic model trained with non-native speech using a processing system to generate word hypotheses for the digitized speech;

performing time alignment between the digitized speech and the word hypotheses utilizing a reference acoustic model trained with native-quality speech to associate word hypotheses with corresponding sounds of the digitized speech;

calculating statistics regarding individual words and phonemes of the word hypotheses using the processing system based on said alignment;

calculating a plurality of features for use in assessing pronunciation of the digitized speech based on the statistics using the processing system;

calculating an assessment score based on one or more of the calculated features; and

storing the assessment score in a computer-readable memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.

40 Citations

View as Search Results

29 Claims

1. A computer-implemented method of assessing spontaneous speech pronunciation, comprising:
- performing speech recognition on digitized speech using a non-native acoustic model trained with non-native speech using a processing system to generate word hypotheses for the digitized speech;
  
  performing time alignment between the digitized speech and the word hypotheses utilizing a reference acoustic model trained with native-quality speech to associate word hypotheses with corresponding sounds of the digitized speech;
  
  calculating statistics regarding individual words and phonemes of the word hypotheses using the processing system based on said alignment;
  
  calculating a plurality of features for use in assessing pronunciation of the digitized speech based on the statistics using the processing system;
  
  calculating an assessment score based on one or more of the calculated features; and
  
  storing the assessment score in a computer-readable memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 24, 28)
- - 2. The method of claim 1 further comprising excluding words not reliably recognized in generating the word hypotheses from contributing to the assessment score.
  - 3. The method of claim 2, wherein a confidence level is associated with each word in the word hypotheses identifying a likelihood that a word in a word hypothesis was correctly recognized in generating the word hypotheses.
  - 4. The method of claim 3, wherein words having corresponding word hypotheses that do not meet a confidence threshold are not considered in calculating the assessment score.
  - 5. The method of claim 1 wherein the features are based on at least one of:
    - Hidden Markov Model probabilities, average phoneme duration, average word duration, phoneme duration distribution, word duration distribution, energy measurements, energy distributions, pitch measurements, pitch distributions, and pitch contours.
  - 6. The method of claim 1 wherein speech samples are scored by a human, and a statistical model is built using the features and human scores;
    - wherein the assessment score is based on the scoring model and one or more of the calculated features.
  - 7. The method of claim 6, wherein the statistical model is built using multiple regression.
  - 8. The method of claim 1, wherein the reference acoustic model is trained with only native speech.
  - 9. The method of claim 1, wherein generating word hypotheses utilizes the non-native acoustic model, a dictionary that maps pronunciations to words, and a language model that identifies a likelihood that a word will follow a sequence of already hypothesized words in a speech recording.
  - 10. The method of claim 9, wherein the language model is an n-gram language model
  - 11. The method of claim 1, wherein the assessment score is based on one or more features selected from the group consisting of:
    - an average likelihood across all letters;
      
      L₁/m, where L₁is a summation of likelihoods of all individual words;
  - 12. The method of claim 1, wherein one or more of the features are utilized with related stress, intonation, vocabulary, or grammar to generate the assessment score indicating communicative competence or other construct of speaking proficiency that includes pronunciation proficiency.
  - 13. The method of claim 1, wherein second statistics derived from native-quality speech are utilized in calculating the plurality of features.
  - 24. The system of claim 1, wherein the assessment score is based on one or more features selected from the group consisting of:
    - an average likelihood across all letters;
      
      L₁/m, where L₁is a summation of likelihoods of all individual words;
  - 28. The method of claim 1, wherein the digitized speech is spontaneous, non-native speech of a non-native language speaker.

14. A computer-implemented system for assessing spontaneous speech pronunciation, comprising:
- a processing system;
  
  a computer-readable memory programmed with instructions for causing the processing system to perform steps including;
  
  performing speech recognition on digitized speech using a non-native acoustic model trained with non-native speech using a processing system to generate word hypotheses for the digitized speech;
  
  performing time alignment between the digitized speech and the word hypotheses utilizing a reference acoustic model trained with native-quality speech to associate word hypotheses with corresponding sounds of the digitized speech;
  
  calculating statistics regarding individual words and phonemes of the word hypotheses using the processing system based on said alignment;
  
  calculating a plurality of features for use in assessing pronunciation of the speech based on the statistics using the processing system;
  
  calculating an assessment score based on one or more of the calculated features; and
  
  storing the assessment score in a computer-readable memory.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 29)
- - 15. The system of claim 14 wherein the steps further comprise excluding words not reliably recognized in generating the word hypotheses from contributing to the assessment score.
  - 16. The system of claim 15, wherein a confidence level is associated with each word in the word hypotheses identifying a likelihood that a word in a word hypothesis was correctly recognized in generating the word hypotheses.
  - 17. The system of claim 16, wherein words having corresponding hypotheses that do not meet a confidence threshold are not considered in calculating the assessment score.
  - 18. The system of claim 14 wherein the features are based on at least one of:
    - Hidden Markov Model probabilities, average phoneme duration, average word duration, phoneme duration distribution, word duration distribution, energy measurements, energy distribution, pitch measurements, pitch distribution, and pitch contours.
  - 19. The system of claim 14 wherein speech samples are scored by a human, and a statistical model is built using the features and human scores;
    - wherein the assessment score is based on the scoring model and one or more of the calculated features.
  - 20. The system of claim 19, wherein the statistical model is built using multiple regression.
  - 21. The system of claim 14, wherein the reference acoustic model is trained with only native speech.
  - 22. The system of claim 14, wherein generating word hypotheses utilizes the non-native acoustic model, a dictionary that maps pronunciations to words, and a language model that identifies a likelihood that a word will follow a sequence of already hypothesized words in a speech recording.
  - 23. The system of claim 22, wherein the language model is an n-gram language model
  - 25. The system of claim 14, wherein one or more of the features are utilized with related stress, intonation, vocabulary, or grammar to generate the assessment score indicating communicative competence or other construct of speaking proficiency that includes pronunciation proficiency.
  - 26. The system of claim 14, wherein second statistics derived from native-quality speech are utilized in calculating the plurality of features.
  - 29. The system of claim 14, wherein the digitized speech is spontaneous, non-native speech of a non-native language speaker.

27. A computer-readable memory comprising computer-readable instructions, which when executed cause a processing system to perform steps comprising:
- performing speech recognition on digitized speech using a non-native acoustic model trained with non-native speech using a processing system to generate word hypotheses for the digitized speech;
  
  performing time alignment between the digitized speech and the word hypotheses utilizing a reference acoustic model trained with native-quality speech;
  
  calculating statistics regarding individual words and phonemes of the word hypotheses using the processing system based on said alignment;
  
  calculating a plurality of features for use in assessing pronunciation of the speech based on the statistics using the processing system;
  
  calculating an assessment score based on one or more of the calculated features; and
  
  storing the assessment score in a computer-readable memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Educational Testing Service
Original Assignee
Educational Testing Service
Inventors
Xi, Xiaoming, Chen, Lei, Zechner, Klaus

Granted Patent

US 8,392,190 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/256.100
CPC Class Codes

G09B 19/06   Foreign languages with audi...

G09B 7/02   of the type wherein the stu...

G10L 15/08   Speech classification or se...

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 17/26   Recognition of special voic...

Systems and Methods for Assessment of Non-Native Spontaneous Speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

40 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

Systems and Methods for Assessment of Non-Native Spontaneous Speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others