Systems and methods for language detection
First Claim
1. A computer-implemented method of identifying a language of a message, the method comprising:
- performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages;
providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests;
obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and
identifying the language of the message based on one of the confidence scores.
6 Assignments
0 Petitions
Accused Products
Abstract
Implementations of the present disclosure are directed to a method, a system, and a computer program storage device for detecting a language in a text message. A plurality of different language detection tests are performed on a message associated with a user. Each language detection test determines a set of scores representing a likelihood that the message is in one of a plurality of different languages. One or more combinations of the score sets are provided as input to one or more distinct classifiers. Output from each of the classifiers includes a respective indication that the message is in one of the different languages. The language in the message may be identified as being the indicated language from one of the classifiers, based on a confidence score and/or an identified linguistic domain.
222 Citations
27 Claims
-
1. A computer-implemented method of identifying a language of a message, the method comprising:
-
performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages; providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests; obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and identifying the language of the message based on one of the confidence scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
one or more computers programmed to perform operations comprising; performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages; providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests; obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and identifying the language of the message based on one of the confidence scores. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
19. An article comprising a non-transitory computer-readable medium having instructions stored thereon that, when executed by a computer, perform operations comprising:
-
performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages; providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests; obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and identifying the language of the message based on one of the confidence scores. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification