Method, apparatus and computer program product for providing flexible text based language identification
First Claim
1. A method comprising:
- receiving an entry in a computer readable text format;
determining an alphabet score of the entry for each of a plurality of languages;
determining an n-gram frequency score of the entry for each of the plurality of languages; and
determining, via a processor, a language associated with the entry based on a composite score comprising a combination of the alphabet score and the n-gram frequency score.
2 Assignments
0 Petitions
Accused Products
Abstract
An apparatus for providing flexible text based language identification includes an alphabet scoring element, an n-gram frequency element and a processing element. The alphabet scoring element may be configured to receive an entry in a computer readable text format and to calculate an alphabet score of the entry for each of a plurality of languages. The n-gram frequency element may be configured to calculate an n-gram frequency score of the entry for each of the plurality of languages. The processing element may be in communication with the n-gram frequency element and the alphabet scoring element. The processing element may also be configured to determine a language associated with the entry based on a combination of the alphabet score and the n-gram frequency score.
298 Citations
33 Claims
-
1. A method comprising:
-
receiving an entry in a computer readable text format; determining an alphabet score of the entry for each of a plurality of languages; determining an n-gram frequency score of the entry for each of the plurality of languages; and determining, via a processor, a language associated with the entry based on a composite score comprising a combination of the alphabet score and the n-gram frequency score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product comprising at least one computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion for receiving an entry in a computer readable text format; a second executable portion for determining an alphabet score of the entry for each of a plurality of languages; a third executable portion for determining an n-gram frequency score of the entry for each of the plurality of languages; and a fourth executable portion for determining a language associated with the entry based on a composite score comprising a combination of the alphabet score and the n-gram frequency score. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus comprising a processor configured to:
-
receive an entry in a computer readable text format and calculate an alphabet score of the entry for each of a plurality of languages; calculate an n-gram frequency score of the entry for each of the plurality of languages; and determine a language associated with the entry based on a composite score comprising a combination of the alphabet score and the n-gram frequency score. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. An apparatus comprising:
-
means for receiving an entry in a computer readable text format; means for determining an alphabet score of the entry for each of a plurality of languages; means for determining an n-gram frequency score of the entry for each of the plurality of languages; and means for determining a language associated with the entry based on a composite score comprising a combination of the alphabet score and the n-gram frequency score. - View Dependent Claims (33)
-
Specification