Apparatus, method and storage medium for identifying a combination of a language and its character code system
First Claim
1. A language identifying apparatus for identifying a combination of a language represented by encoded text data and its character code system, comprising:
- a storage device storing for each combination of a language and a character code system a plurality of occurrence probability tables each describing the probability that a character code occurs in the combination;
means for respectively reading out the occurrence probabilities from said plurality of occurrence probability tables with respect to one or a plurality of character codes included in fed text data, to obtain evaluation data for each combination of the language and the character code system; and
means for identifying the combination of the language represented by the fed text data and the character code system based only on the obtained evaluation data.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to identifying a language represented by a character code and its character code system. An occurrence probability table describing for each character the probability that a character code occurs is prepared for each combination of a language and a character code system. An entered character code string is divided for each character, and the occurrence probability table is referred to, to obtain the probability that the character code occurs. The product of the occurrence probabilities is calculated for each combination of the language and the character code system, to judge the combination of the language and the character code system with respect to the entered character code string on the basis of the obtained product.
82 Citations
8 Claims
-
1. A language identifying apparatus for identifying a combination of a language represented by encoded text data and its character code system, comprising:
-
a storage device storing for each combination of a language and a character code system a plurality of occurrence probability tables each describing the probability that a character code occurs in the combination;
means for respectively reading out the occurrence probabilities from said plurality of occurrence probability tables with respect to one or a plurality of character codes included in fed text data, to obtain evaluation data for each combination of the language and the character code system; and
means for identifying the combination of the language represented by the fed text data and the character code system based only on the obtained evaluation data. - View Dependent Claims (2, 3)
said means for obtaining the evaluation data respectively calculates the product of the occurrence probabilities read out from the occurrence probability tables. -
3. The language identifying apparatus according to claim 1, wherein
the means for judging the combination of the language represented by the fed text data and the character code system on the basis of the obtained evaluation data is missing.
-
-
4. A language identifying method for identifying a combination of a language represented by encoded text data and its character code system, comprising the steps of:
-
preparing, for each combination of a language and a character code system, occurrence probability tables each describing the probability that a character code occurs in the combination;
respectively reading out the occurrence probabilities from said plurality of occurrence probability tables with respect to one or a plurality of character codes included in fed text data, to obtain evaluation data for each combination of the language and the character code system; and
identifying the combination of the language represented by the fed text data and the character code system based only on the obtained evaluation data. - View Dependent Claims (5)
calculating the product of the occurrence probabilities respectively read out from the occurrence probability tables, to obtain said evaluation data.
-
-
6. A storage medium storing a program for identifying a combination of a language represented by encoded text data and its character code system using occurrence probability tables each describing for each combination of a language and a character code system the probability that a character code occurs in the combination,
said program controlling a computer so as to respectively read out the occurrence probabilities from said plurality of occurrence probability tables with respect to one or a plurality of character codes included in fed text data, to obtain evaluation data for each combination of the language and the character code system, and to identify the combination of the language represented by the fed text data and the character code system based only on the obtained evaluation data.
Specification