Systems and methods for assessment of non-native spontaneous speech

US 9,177,558 B2
Filed: 01/31/2013
Issued: 11/03/2015
Est. Priority Date: 12/01/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of assessing speech pronunciation, comprising:

receiving speech for analysis via a computer-readable storage medium;

performing automatic speech recognition on speech using a processor to generate word hypotheses for the speech, the word hypotheses identifying a set words recognized by an automated speech recognizer in the speech using one or more data processors;

performing time alignment between the speech and the word hypotheses using the automatic speech recognizer to associate the word hypotheses with corresponding sounds of the speech;

calculating statistics regarding individual words and phonemes of the word hypotheses using the processor based on said alignment;

calculating a plurality of features for use in assessing pronunciation of the speech based on the statistics using the processor; and

calculating an assessment score based on one or more of the calculated features.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.

Citations

33 Claims

1. A computer-implemented method of assessing speech pronunciation, comprising:
- receiving speech for analysis via a computer-readable storage medium;
  
  performing automatic speech recognition on speech using a processor to generate word hypotheses for the speech, the word hypotheses identifying a set words recognized by an automated speech recognizer in the speech using one or more data processors;
  
  performing time alignment between the speech and the word hypotheses using the automatic speech recognizer to associate the word hypotheses with corresponding sounds of the speech;
  
  calculating statistics regarding individual words and phonemes of the word hypotheses using the processor based on said alignment;
  
  calculating a plurality of features for use in assessing pronunciation of the speech based on the statistics using the processor; and
  
  calculating an assessment score based on one or more of the calculated features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 further comprising excluding words not reliably recognized in generating the word hypotheses from contributing to the assessment score.
  - 3. The method of claim 2, wherein a confidence level is associated with each word in the word hypotheses identifying a likelihood that a word in a word hypothesis was correctly recognized in generating the word hypotheses.
  - 4. The method of claim 3, wherein words having corresponding word hypotheses that do not meet a confidence threshold are not considered in calculating the assessment score.
  - 5. The method of claim 1, wherein the features are based on at least one of:
    - Hidden Markov Model probabilities, average phoneme duration, average word duration, phoneme duration distribution, word duration distribution, energy measurements, energy distributions, pitch measurements, pitch distributions, and pitch contours.
  - 6. The method of claim 1, wherein speech samples are scored by a human, and a statistical model is built using the features and human scores;
    - wherein the assessment score is based on the scoring model and one or more of the calculated features.
  - 7. The method of claim 6, wherein the statistical model is built using multiple regression.
  - 8. The method of claim 1, wherein the speech is spontaneous, non-native speech of a non-native language speaker.
  - 9. The method of claim 1, wherein the assessment score is based on one or more features selected from the group consisting of:
    - an average likelihood across all letters;
      
      L₁/m, where L₁is a summation of likelihoods of all individual words;
  - 10. The method of claim 1, wherein one or more of the features are utilized with related stress, intonation, vocabulary, or grammar to generate the assessment score indicating communicative competence or other construct of speaking proficiency that includes pronunciation proficiency.
  - 11. The method of claim 1, wherein second statistics derived from native-quality speech are utilized in calculating the plurality of features.

12. A computer-implemented system for assessing speech pronunciation, comprising:
- a processor;
  
  a non-transitory computer-readable memory comprising instructions for causing the processor to perform steps including;
  
  receiving speech for analysis via a computer-readable storage medium;
  
  performing automatic speech recognition on speech using a processor to generate word hypotheses for the speech, the word hypotheses identifying a set words recognized by an automated speech recognizer in the speech using one or more data processors;
  
  performing time alignment between the speech and the word hypotheses using the automatic speech recognizer to associate the word hypotheses with corresponding sounds of the speech;
  
  calculating statistics regarding individual words and phonemes of the word hypotheses using the processor based on said alignment;
  
  calculating a plurality of features for use in assessing pronunciation of the speech based on the statistics using the processor; and
  
  calculating an assessment score based on one or more of the calculated features.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The system of claim 12, wherein the steps further comprise excluding words not reliably recognized in generating the word hypotheses from contributing to the assessment score.
  - 14. The system of claim 13, wherein a confidence level is associated with each word in the word hypotheses identifying a likelihood that a word in a word hypothesis was correctly recognized in generating the word hypotheses.
  - 15. The system of claim 14, wherein words having corresponding hypotheses that do not meet a confidence threshold are not considered in calculating the assessment score.
  - 16. The system of claim 12, wherein the features are based on at least one of:
    - Hidden Markov Model probabilities, average phoneme duration, average word duration, phoneme duration distribution, word duration distribution, energy measurements, energy distribution, pitch measurements, pitch distribution, and pitch contours.
  - 17. The system of claim 12, wherein speech samples are scored by a human, and a statistical model is built using the features and human scores;
    - wherein the assessment score is based on the scoring model and one or more of the calculated features.
  - 18. The system of claim 17, wherein the statistical model is built using multiple regression.
  - 19. The system of claim 12, wherein the assessment score is based on one or more features selected from the group consisting of:
    - an average likelihood across all letters;
      
      L₁/m, where L₁is a summation of likelihoods of all individual words;
  - 20. The system of claim 12, wherein one or more of the features are utilized with related stress, intonation, vocabulary, or grammar to generate the assessment score indicating communicative competence or other construct of speaking proficiency that includes pronunciation proficiency.
  - 21. The system of claim 12, wherein second statistics derived from native-quality speech are utilized in calculating the plurality of features.
  - 22. The system of claim 12, wherein the speech is spontaneous, non-native speech of a non-native language speaker.

23. A non-transitory computer-readable memory comprising computer-readable instructions, which when executed cause a processor to perform steps comprising:
- receiving speech for analysis via a computer-readable storage medium;
  
  performing automatic speech recognition on speech using a processor to generate word hypotheses for the speech, the word hypotheses identifying a set words recognized by an automated speech recognizer in the speech using one or more data processors;
  
  performing time alignment between the speech and the word hypotheses using the automatic speech recognizer to associate the word hypotheses with corresponding sounds of the speech;
  
  calculating statistics regarding individual words and phonemes of the word hypotheses using the processor based on said alignment;
  
  calculating a plurality of features for use in assessing pronunciation of the speech based on the statistics using the processor; and
  
  calculating an assessment score based on one or more of the calculated features.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 24. The non-transitory computer-readable memory of claim 23, wherein the instructions cause the processor to perform steps comprising:
    - excluding words not reliably recognized in generating the word hypotheses from contributing to the assessment score.
  - 25. The non-transitory computer-readable memory of claim 23, wherein a confidence level is associated with each word in the word hypotheses identifying a likelihood that a word in a word hypothesis was correctly recognized in generating the word hypotheses.
  - 26. The non-transitory computer-readable memory of claim 25, wherein words having corresponding word hypotheses that do not meet a confidence threshold are not considered in calculating the assessment score.
  - 27. The non-transitory computer-readable memory of claim 23, wherein the features are based on at least one of:
    - Hidden Markov Model probabilities, average phoneme duration, average word duration, phoneme duration distribution, word duration distribution, energy measurements, energy distributions, pitch measurements, pitch distributions, and pitch contours.
  - 28. The non-transitory computer-readable memory of claim 23, wherein speech samples are scored by a human, and a statistical model is built using the features and human scores;
    - wherein the assessment score is based on the scoring model and one or more of the calculated features.
  - 29. The non-transitory computer-readable memory of claim 28, wherein the statistical model is built using multiple regression.
  - 30. The non-transitory computer-readable memory of claim 23, wherein the assessment score is based on one or more features selected from the group consisting of:
    - an average likelihood across all letters;
      
      L₁/m, where L₁is a summation of likelihoods of all individual words;
  - 31. The non-transitory computer-readable memory of claim 23, wherein one or more of the features are utilized with related stress, intonation, vocabulary, or grammar to generate the assessment score indicating communicative competence or other construct of speaking proficiency that includes pronunciation proficiency.
  - 32. The non-transitory computer-readable memory of claim 23, wherein second statistics derived from native-quality speech are utilized in calculating the plurality of features.
  - 33. The non-transitory computer-readable memory of claim 23, wherein the digitized speech is spontaneous, non-native speech of a non-native language speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Educational Testing Service
Original Assignee
Educational Testing Service
Inventors
Chen, Lei, Zechner, Klaus, Xi, Xiaoming
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US13/755,790
Publication Number

US 20130144621A1
Time in Patent Office

1,006 Days
Field of Search

704/246, 704/256.1
US Class Current

1/1
CPC Class Codes

G09B 19/06   Foreign languages with audi...

G09B 7/02   of the type wherein the stu...

G10L 15/08   Speech classification or se...

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 17/26   Recognition of special voic...

Systems and methods for assessment of non-native spontaneous speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for assessment of non-native spontaneous speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links