Scalable neural network-based language identification from written text
First Claim
1. A method of identifying a language of a string of alphabet characters among a plurality of languages based on an automatic language identification system, each said plurality of languages having an individual set of alphabet characters, said method characterized by mapping the string of alphabet characters into a mapped string of alphabet characters selected from a reference set of alphabet characters, obtaining a first value indicative of a probability of the mapped string of alphabet characters being each one of said plurality of languages, obtaining a second value indicative of a match of the alphabet characters in the string in each individual set, and deciding the language of the string based on the first value and the second value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for language identification from written text, wherein a neural network-based language identification system is used to identify the language of a string of alphabet characters among a plurality of languages. A standard set of alphabet characters is used for mapping the string into a mapped string of alphabet characters so as to allow the NN-LID system to determine the likelihood of the mapped string being one of languages based on the standard set. The characters of the standard set are selected from the alphabet characters of the language-dependent sets. A scoring system is also used to determine the likelihood of the string being each one of the languages based on the language-dependent sets.
110 Citations
25 Claims
-
1. A method of identifying a language of a string of alphabet characters among a plurality of languages based on an automatic language identification system, each said plurality of languages having an individual set of alphabet characters, said method characterized by
mapping the string of alphabet characters into a mapped string of alphabet characters selected from a reference set of alphabet characters, obtaining a first value indicative of a probability of the mapped string of alphabet characters being each one of said plurality of languages, obtaining a second value indicative of a match of the alphabet characters in the string in each individual set, and deciding the language of the string based on the first value and the second value.
-
14. A method of identifying a language of a string of alphabet characters among a plurality of languages based on an automatic language identification system, said plurality of languages classified into a plurality of language groups, each group having an individual set of alphabet characters, said method characterized by
mapping the string of alphabet characters into a mapped string of alphabet characters selected from a reference set of alphabet characters, by obtaining a first value indicative of a probability of the mapped string of alphabet characters being each one of said plurality of languages, obtaining a second value indicative of a match of the alphabet characters in the string in each individual set, and deciding the language of the string based on the first value and the second value.
-
17. A language identification system for identifying a language of a string of alphabet characters among a plurality of languages, each of said plurality of languages having an individual set of alphabet characters, said system characterized by:
-
a reference set of alphabet characters, a mapping module for mapping the string of alphabet characters into a mapped string of alphabet characters selected from the reference set for providing a signal indicative of the mapped string, a first language discrimination module, responsive to the signal, for determining the likelihood of the mapped string being each one of said plurality of languages based on the reference set for providing first information indicative of the likelihood, a second language discrimination module, for determining the likelihood of the string being each one of said plurality of languages based on the individual sets of alphabet characters for providing second information indicative of the likelihood, and a decision module, responsive to the first information and second information, for determining the combined likelihood of the string being one of said plurality of languages based on the first information and second information. - View Dependent Claims (18, 19, 20)
-
-
21. An electronic device, comprising:
-
a module for providing a signal indicative of a string of alphabet characters;
a language identification system, responsive to the signal, for identifying a language of the string among a plurality of languages, each of said plurality of languages having an individual set of alphabet characters, the system characterized by a reference set of alphabet characters;
a mapping module for mapping the string of alphabet characters into a mapped string of alphabet characters selected from the reference set for providing a further signal indicative of the mapped string;
a first language discrimination module, responsive to the further signal, for determining the likelihood of the mapped string being each one of said plurality of languages based on the reference set for providing first information indicative of the likelihood;
a second language discrimination module, responsive to the first signal, for determining the likelihood of the string being each one of said plurality of languages based on the individual sets of alphabet characters for providing second information indicative of the likelihood;
a decision module, responding to the first information and second information, for determining the combined likelihood of the string being one of said plurality of languages based on the first information and second information. - View Dependent Claims (22, 24, 25)
-
Specification