Computer-implemented systems and methods for determining an intelligibility score for speech

US 9,613,638 B2
Filed: 02/26/2015
Issued: 04/04/2017
Est. Priority Date: 02/28/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of generating an intelligibility score for speech of a non-native speaker, comprising:

receiving a recording of speech of a non-native speaker at a processing system;

identifying words in the speech recording using a computerized automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording based on a computerized acoustic model, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words;

determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words;

determining an acoustic score with the processing system for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises;

determining a phone context for a phone of the particular word;

determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight;

determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word;

determining an intelligibility score with the processing system for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and

determining an overall intelligibility score with the processing system for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for generating an intelligibility score for speech of a non-native speaker. Words in a speech recording are identified using an automated speech recognizer, where the automated speech recognizer provides a string of words identified in the speech recording, and where the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words. For a particular word in the string of words, a context metric value is determined based upon a usage of the particular word within the string of words. An acoustic score for the particular word is determined based on the acoustic model likelihood score for the particular word from the automated speech recognizer. An intelligibility score is determined for the particular word based on the acoustic score for the particular word and the context metric value for the particular word.

39 Citations

View as Search Results

20 Claims

1. A computer-implemented method of generating an intelligibility score for speech of a non-native speaker, comprising:
- receiving a recording of speech of a non-native speaker at a processing system;
  
  identifying words in the speech recording using a computerized automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording based on a computerized acoustic model, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words;
  
  determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words;
  
  determining an acoustic score with the processing system for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises;
  
  determining a phone context for a phone of the particular word;
  
  determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight;
  
  determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word;
  
  determining an intelligibility score with the processing system for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and
  
  determining an overall intelligibility score with the processing system for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the context weight for a particular phone is based on one or more of a position of the particular phone in the word, whether the particular phone is a vowel phone, and whether the particular phone is stressed.
  - 3. The method of claim 1, wherein the acoustic score for the particular word is based on a sum of the acoustic model likelihood score multiplied by the adjusted context weight for each phone of the particular word.
  - 4. The method of claim 1, wherein the acoustic score for the particular word is calculated using a formula consisting of:
  - 5. The method of claim 1, wherein the context metric value is based on a probability of the particular word being used with the other words in the string of words, a part of speech of the particular word, and a frequency of occurrence of the particular word in a corpus of documents.
  - 6. The method of claim 1, further comprising:
    - comparing the intelligibility score for the particular word to an intelligibility threshold to determine whether the particular word meets an intelligibility criterion;
      
      wherein the overall intelligibility score for the string of words is based on a proportion of words in the string of words that meet the intelligibility criterion.
  - 7. The method of claim 6, wherein the intelligibility threshold is set based on a model training operation that utilizes training recordings of speech manually scored for intelligibility by one or more human scorers.
  - 8. The method of claim 7, wherein the model training operation comprises determining training intelligibility scores for words within the training recordings of speech and correlating the training intelligibility scores with the manual scores assigned by the one or more human scorers.
  - 9. The method of claim 8, wherein correlating comprises establishing a function which maps intelligibility scores to manual scores.
  - 10. The method of claim 1, wherein the overall intelligibility score is based on an average intelligibility score of all words in the string of words or a subset of words considered eligible for computing the intelligibility score.
  - 11. The method of claim 1, wherein the words in the speech recording are identified by the automated speech recognizer.
  - 12. The method of claim 1, wherein a second word having a same acoustic score as the acoustic score for the particular word is determined to have a different intelligibility score based on differing context metric values between the particular word and the second word.
  - 13. The method of claim 1, wherein the automated speech recognizer further provides an acoustic model likelihood score for each phone within each word, a language model likelihood score for each word, and a confidence score for each word.
  - 14. The method of claim 1, wherein the context metric value is further based upon a part of speech of the particular word or a lexical frequency of the particular word.
  - 15. The method of claim 1, further comprising providing a display that includes the identified words and the overall intelligibility score.

16. A computer-implemented system for generating an intelligibility score for speech of a non-native speaker, comprising:
- a processing system comprising one or more data processors;
  
  a non-transitory computer-readable medium encoded to contain;
  
  a recording of speech of a non-native speaker;
  
  an intelligibility score data structure comprising records associated with each word of a string of words, wherein a record for a particular word in the string of words includes fields for storing an acoustic score for the particular word, a context metric value for the particular word, and an intelligibility score for the particular word;
  
  instructions for commanding the processing system to execute steps comprising;
  
  identifying words in the speech recording using an automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words;
  
  determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for the particular word, determining a context metric value based upon a usage of the particular word within the string of words and storing the context metric value in the intelligibility score data structure record for the particular word;
  
  determining an acoustic score for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer and storing the acoustic score in the intelligibility score data structure record for the particular word, wherein determining the acoustic score for the particular word comprises;
  
  determining a phone context for a phone of the particular word;
  
  determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight;
  
  determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word;
  
  determining an intelligibility score for the particular word based on the acoustic score for the particular word and the context metric value for the particular word and storing the intelligibility score in the intelligibility score data structure record for the particular word; and
  
  determining an overall intelligibility score for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.
- View Dependent Claims (17, 18)
- - 17. The system of claim 16, wherein the context weight for a particular phone is based on one or more of a position of the particular phone in the word, whether the particular phone is a vowel phone, and whether the particular phone is stressed.
  - 18. The system of claim 16, wherein the acoustic score for the particular word is based on a sum of the acoustic model likelihood score multiplied by the adjusted context weight for each phone of the particular word.

19. A non-transitory computer-readable medium encoded with instructions for commanding a processing system to execute a method of generating an intelligibility score for speech of a non-native speaker, comprising:
- receiving a recording of speech of a non-native speaker;
  
  identifying words in the speech recording using an automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words;
  
  determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words;
  
  determining an acoustic score for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises;
  
  determining a phone context for a phone of the particular word;
  
  determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight;
  
  determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word;
  
  determining an intelligibility score for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and
  
  determining an overall intelligibility score for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.

20. A computer-implemented system for generating an intelligibility score for speech of a non-native speaker, comprising:
- means for receiving a recording of speech of a non-native speaker;
  
  means for identifying words in the speech recording using a computerized automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording based on a computerized acoustic model, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words;
  
  means for determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words;
  
  means for determining an acoustic score with the processing system for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises;
  
  determining a phone context for a phone of the particular word;
  
  determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight;
  
  determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word;
  
  means for determining an intelligibility score with the processing system for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and
  
  means for determining an overall intelligibility score with the processing system for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Educational Testing Service
Original Assignee
Educational Testing Service
Inventors
Loukina, Anastassia, Evanini, Keelan
Primary Examiner(s)
WOZNIAK, JAMES S

Application Number

US14/632,231
Publication Number

US 20150248898A1
Time in Patent Office

768 Days
Field of Search

704251, 704254, 704270, 434167, 434178, 434185
US Class Current
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G10L 15/1822   Parsing for meaning underst...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/26   Speech to text systems G10L...

G10L 25/60   for measuring the quality o...

G10L 25/69   for evaluating synthetic or...

Computer-implemented systems and methods for determining an intelligibility score for speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

39 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Computer-implemented systems and methods for determining an intelligibility score for speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others