Machine training for native language and fluency identification
First Claim
1. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause a device to:
- train a machine by a machine learning technique for recognizing speech utterance to determine language fluency level of a user,the training comprising at least;
receiving native speaker recorded data from a database of recorded speech of at least one native speaker,receiving a language specific dictionary of heteronyms,parsing the native speaker recorded data and isolating the heteronyms from the native speaker recorded data,extracting linguistic features from the native speaker recorded data including at least linguistic features associated with the heteronyms, the linguistic features associated with the heteronyms including at least phonetics, andgenerating a language dependent machine learning model based at least on the linguistic features, wherein the language dependent machine learning model is trained to output a score indicating language fluency;
generate a test corpus of sentences, wherein each sentence in the test corpus includes at least one pair of heteronyms, wherein heteronyms are words spelled identically but having different pronunciations and meanings from one another;
cause presenting of a sentence from the test corpus to the user on a user interface display;
receive a test speech utterance of the user uttering the presented sentence;
execute the language dependent machine learning model operating on the test speech utterance to obtain user pronunciation of the presented sentence including the at least two heteronyms;
evaluate a language fluency level of the user based on the obtained user pronunciation; and
output a score representing the language fluency level of the user.
1 Assignment
0 Petitions
Accused Products
Abstract
Training a machine by a machine learning technique for recognizing speech utterance to determine language fluency level of a user. Native speaker recorded data and language specific dictionary of heteronyms may be retrieved. The native speaker recorded data may be parsed and the heteronyms from the native speaker recorded data may be isolated. Linguistic features from the native speaker recorded data including at least linguistic features associated with the heteronyms may be extracted, and a language dependent machine learning model is generated based on the linguistic features.
-
Citations
16 Claims
-
1. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause a device to:
-
train a machine by a machine learning technique for recognizing speech utterance to determine language fluency level of a user, the training comprising at least; receiving native speaker recorded data from a database of recorded speech of at least one native speaker, receiving a language specific dictionary of heteronyms, parsing the native speaker recorded data and isolating the heteronyms from the native speaker recorded data, extracting linguistic features from the native speaker recorded data including at least linguistic features associated with the heteronyms, the linguistic features associated with the heteronyms including at least phonetics, and generating a language dependent machine learning model based at least on the linguistic features, wherein the language dependent machine learning model is trained to output a score indicating language fluency; generate a test corpus of sentences, wherein each sentence in the test corpus includes at least one pair of heteronyms, wherein heteronyms are words spelled identically but having different pronunciations and meanings from one another; cause presenting of a sentence from the test corpus to the user on a user interface display; receive a test speech utterance of the user uttering the presented sentence; execute the language dependent machine learning model operating on the test speech utterance to obtain user pronunciation of the presented sentence including the at least two heteronyms; evaluate a language fluency level of the user based on the obtained user pronunciation; and output a score representing the language fluency level of the user. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system of training a machine that recognizes native speech utterance, comprising:
-
a hardware processor; a storage device communicatively coupled to the hardware processor and storing native speaker recorded data; the hardware processor executing a machine learning technique to train the hardware processor to recognize speech utterance to determine language fluency level of a user, the training comprising the hardware processor; receiving native speaker recorded data from the storage device; receiving language specific dictionary of heteronyms; parsing the native speaker recorded data and identifying the heteronyms from the native speaker recorded data; extracting linguistic features from the native speaker recorded data including at least linguistic features associated with the heteronyms, the linguistic features associated with the heteronyms including at least phonetics; and generating a language dependent machine learning model based on the linguistic features, wherein the language dependent machine learning model is trained to output a score indicating language fluency; the hardware processor further performing; generating a test corpus of sentences, wherein each sentence in the test corpus includes at least one pair of heteronyms, and wherein heteronyms are words that are spelled identically but having different pronunciations and meanings from one another; causing presenting of a sentence from the test corpus to the user; receiving a test speech utterance of the user uttering the presented sentence; executing the language dependent machine learning model operating on the test speech utterance to obtain user pronunciation of the presented sentence including the at least two heteronyms; evaluating a language fluency level of the user based on the obtained user pronunciation; and outputting a score representing the language fluency level of the user. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to:
-
generate a test corpus of sentences in a single dialect of a single language, each sentence in the corpus including at least two heteronyms spelled identically but having different pronunciations and meanings from one another; cause displaying of a sentence from the test corpus to a user; receive, at a language dependent machine learning model, data representing a test speech utterance of the user uttering the displayed sentence; execute the language dependent machine learning model operating on the data representing the test speech utterance to obtain user pronunciation of the displayed sentence including the at least two heteronyms, wherein the language dependent machine learning model is trained using at least linguistic features extracted from native speaker recorded data uttering the heteronyms present in the sentences of the test corpus, the linguistic features including at least phonetics associated with heteronyms, wherein the language dependent machine learning model is trained to output a score indicating a language fluency level of a user by evaluating user pronunciation of the at least two heteronyms in at least one of the sentences of the test corpus based on feature parameters associated with said at least one sentence and indicating different pronunciations of the two heteronyms in the sentence; and output a score representing the language fluency level of the user based on the data representing the test speech utterance.
-
Specification