×

Systems and methods for language detection

  • US 10,162,811 B2
  • Filed: 10/03/2016
  • Issued: 12/25/2018
  • Est. Priority Date: 10/17/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of identifying a language in a message, the method comprising:

  • obtaining a text message generated by a user;

    removing non-language characters from the text message to generate a sanitized text message;

    detecting an alphabet and a script present in the sanitized text message, wherein(i) detecting the alphabet comprises performing an alphabet-based language detection test to determine a first set of scores, and wherein each score in the first set of scores represents a likelihood that the sanitized text message comprises the alphabet for one of a plurality of different languages, and(ii) detecting the script comprises performing a script-based language detection test to determine a second set of scores, and wherein each score in the second set of scores represents a likelihood that the sanitized text message comprises the script for one of the plurality of different languages;

    providing one or more combinations of the first and second sets of scores as input to one or more classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of language detection tests and the second classifier was trained using outputs from a second combination of language detection tests;

    obtaining as output from at least one of the one or more classifiers a respective confidence score that the sanitized text message is in one of a plurality of different languages; and

    identifying the language in the sanitized text message based on the confidence score from at least one of the one or more classifiers.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×