Automatic cognate detection in a computer-assisted language learning system
First Claim
1. A computer-implemented method for automatic cognate detection, the method comprising:
- stemming, by a processor, a first word in a first language in a bilingual corpus to obtain a first stem and a second word in a second language in the bilingual corpus to obtain a second stem;
calculating, by the processor, a probability for aligning the first stem and the second stem;
normalizing, by the processor, the first stem and the second stem;
calculating, by the processor, a distance metric between the normalized first stem and the normalized second stem;
identifying, by the processor, the first word and the second word as a cognate pair when the probability and the distance metric meet a threshold criterion;
storing the cognate pair in a set of cognates;
retrieving, by the processor, a candidate sentence in the second language from a corpus;
filtering, by the processor, the candidate sentence by an active vocabulary of a user in the second language and the set of cognates;
calculating, by the processor, a sentence quality score for the candidate sentence;
ranking, by the processor, the candidate sentence based on the sentence quality score; and
presenting the ranked candidate sentence as a pure or combined audio, graphic, textual, or video stimulus to the user.
1 Assignment
0 Petitions
Accused Products
Abstract
According to an aspect, a first word in a first language and a second word in a second language in a bilingual corpus are stemmed. A probability for aligning the first stem and the second stem and a distance metric between the normalized first stem and the normalized second stem are calculated. The first word and the second word are identified as a cognate pair when the probability and the distance metric meet a threshold criterion and stored as a cognate pair in a set of cognates. A candidate sentence in the second language is retrieved from a corpus. The candidate sentence is filtered by the active vocabulary of a user in the second language and the set of cognates. A sentence quality score is calculated for the candidate sentence; and the candidate sentence is ranked for presentation to the user based on the sentence quality scorer.
-
Citations
17 Claims
-
1. A computer-implemented method for automatic cognate detection, the method comprising:
-
stemming, by a processor, a first word in a first language in a bilingual corpus to obtain a first stem and a second word in a second language in the bilingual corpus to obtain a second stem; calculating, by the processor, a probability for aligning the first stem and the second stem; normalizing, by the processor, the first stem and the second stem; calculating, by the processor, a distance metric between the normalized first stem and the normalized second stem; identifying, by the processor, the first word and the second word as a cognate pair when the probability and the distance metric meet a threshold criterion; storing the cognate pair in a set of cognates; retrieving, by the processor, a candidate sentence in the second language from a corpus; filtering, by the processor, the candidate sentence by an active vocabulary of a user in the second language and the set of cognates; calculating, by the processor, a sentence quality score for the candidate sentence; ranking, by the processor, the candidate sentence based on the sentence quality score; and presenting the ranked candidate sentence as a pure or combined audio, graphic, textual, or video stimulus to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for automatic cognate detection, the system comprising:
-
a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the one or more processors being configured to; stem a first word in a first language in a bilingual corpus to obtain a first stem and a second word in a second language in the bilingual corpus to obtain a second stem; calculate a probability for aligning the first stem and the second stem; normalize the first stem and the second stem; calculate a distance metric between the normalized first stem and the normalized second stem; identify the first word and the second word as a cognate pair when the probability and the distance metric meet a threshold criterion; store the cognate pair in a set of cognates; retrieve a candidate sentence in the second language from a corpus; filter the candidate sentence by an active vocabulary of a user in the second language and the set of cognates; calculate a sentence quality score for the candidate sentence; rank the candidate sentence based on the sentence quality score; and present the ranked candidate sentence as a pure or combined audio, graphic, textual, or video stimulus to the user. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product for automatic cognate detection, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform:
-
stemming a first word in a first language in a bilingual corpus to obtain a first stem and a second word in a second language in the bilingual corpus to obtain a second stem; calculating a probability for aligning the first stem and the second stem; normalizing the first stem and the second stem; calculating a distance metric between the normalized first stem and the normalized second stem; identifying the first word and the second word as a cognate pair when the probability and the distance metric meet a threshold criterion; storing the cognate pair in a set of cognates; retrieving a candidate sentence in the second language from a corpus; filtering the candidate sentence by an active vocabulary of a user in the second language and the set of cognates; calculating a sentence quality score for the candidate sentence; ranking the candidate sentence based on the sentence quality score; and presenting the ranked candidate sentence as a pure or combined audio, graphic, textual, or video stimulus to the user.
-
Specification