Computer-implemented systems and methods for determining an intelligibility score for speech
First Claim
1. A computer-implemented method of generating an intelligibility score for speech of a non-native speaker, comprising:
- receiving a recording of speech of a non-native speaker at a processing system;
identifying words in the speech recording using a computerized automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording based on a computerized acoustic model, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words;
determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words;
determining an acoustic score with the processing system for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises;
determining a phone context for a phone of the particular word;
determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight;
determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word;
determining an intelligibility score with the processing system for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and
determining an overall intelligibility score with the processing system for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for generating an intelligibility score for speech of a non-native speaker. Words in a speech recording are identified using an automated speech recognizer, where the automated speech recognizer provides a string of words identified in the speech recording, and where the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words. For a particular word in the string of words, a context metric value is determined based upon a usage of the particular word within the string of words. An acoustic score for the particular word is determined based on the acoustic model likelihood score for the particular word from the automated speech recognizer. An intelligibility score is determined for the particular word based on the acoustic score for the particular word and the context metric value for the particular word.
39 Citations
20 Claims
-
1. A computer-implemented method of generating an intelligibility score for speech of a non-native speaker, comprising:
-
receiving a recording of speech of a non-native speaker at a processing system; identifying words in the speech recording using a computerized automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording based on a computerized acoustic model, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words; determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words; determining an acoustic score with the processing system for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises; determining a phone context for a phone of the particular word; determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight; determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word; determining an intelligibility score with the processing system for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and determining an overall intelligibility score with the processing system for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented system for generating an intelligibility score for speech of a non-native speaker, comprising:
-
a processing system comprising one or more data processors; a non-transitory computer-readable medium encoded to contain; a recording of speech of a non-native speaker; an intelligibility score data structure comprising records associated with each word of a string of words, wherein a record for a particular word in the string of words includes fields for storing an acoustic score for the particular word, a context metric value for the particular word, and an intelligibility score for the particular word; instructions for commanding the processing system to execute steps comprising; identifying words in the speech recording using an automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words; determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for the particular word, determining a context metric value based upon a usage of the particular word within the string of words and storing the context metric value in the intelligibility score data structure record for the particular word; determining an acoustic score for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer and storing the acoustic score in the intelligibility score data structure record for the particular word, wherein determining the acoustic score for the particular word comprises; determining a phone context for a phone of the particular word; determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight; determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word; determining an intelligibility score for the particular word based on the acoustic score for the particular word and the context metric value for the particular word and storing the intelligibility score in the intelligibility score data structure record for the particular word; and determining an overall intelligibility score for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words. - View Dependent Claims (17, 18)
-
-
19. A non-transitory computer-readable medium encoded with instructions for commanding a processing system to execute a method of generating an intelligibility score for speech of a non-native speaker, comprising:
-
receiving a recording of speech of a non-native speaker; identifying words in the speech recording using an automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words; determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words; determining an acoustic score for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises; determining a phone context for a phone of the particular word; determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight; determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word; determining an intelligibility score for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and determining an overall intelligibility score for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.
-
-
20. A computer-implemented system for generating an intelligibility score for speech of a non-native speaker, comprising:
-
means for receiving a recording of speech of a non-native speaker; means for identifying words in the speech recording using a computerized automated speech recognizer, wherein the automated speech recognizer provides a string of words identified in the speech recording based on a computerized acoustic model, and wherein the automated speech recognizer further provides an acoustic model likelihood score for each word in the string of words; means for determining a context metric value for each word in the string of words, wherein determining a context metric value includes, for a particular word in the string of words, determining a context metric value with the processing system based upon a usage of the particular word within the string of words; means for determining an acoustic score with the processing system for the particular word based on the acoustic model likelihood score for the particular word from the automated speech recognizer, wherein determining the acoustic score for the particular word comprises; determining a phone context for a phone of the particular word; determining a phone context weight for the phone based on the phone context, each phone of the particular word being assigned a single phone context weight; determining a phone acoustic score for the phone that identifies a likelihood that the non-native speaker was actually pronouncing the phone based on the phone context weight and an acoustic model likelihood score for the phone, wherein the acoustic score for the particular word is based on phone acoustic scores for multiple phones of the particular word; means for determining an intelligibility score with the processing system for the particular word based on the acoustic score for the particular word and the context metric value for the particular word; and means for determining an overall intelligibility score with the processing system for the string of words based on the intelligibility score for the particular word and intelligibility scores for other words in the string of words.
-
Specification