System and method for evaluating character sets of a message containing a plurality of character sets
First Claim
1. A method of evaluating characters in a message containing a plurality of character sets, comprising the steps of:
- creating a character table bank having a plurality of columns for each character row, wherein the columns correspond in number to the plurality of character sets and include an indication of whether or not an associated character set is represented in the corresponding character row;
creating a mask, the mask comprising a number of mask columns that correspond to an equivalent number of columns in the character table bank, wherein the mask columns contain an indication of the character sets against which the characters of the message are to be evaluated;
accepting an input of the characters of the message;
evaluating a portion of the message by accessing the corresponding character row of the character table bank for each of a predetermined number of the characters of the message and performing a logical AND operation between each of the corresponding character rows and the mask;
filling a character match list with an entry for each of the character sets that result in a non-zero result after the logical AND operation; and
returning the character match list.
1 Assignment
0 Petitions
Accused Products
Abstract
An evaluator system accepts input textual messages in unknown languages and assesses which character sets, corresponding to languages, matches that message. Textual messages whose individual characters are encoded in 16 bit Unicode of other universal format are parsed, and character sets which can express each character and the accumulated correspondence is logged. When the character sets against which the message is being tested only provide partial matches, the invention can determine which offers the best fit, including by way of a weighting function. The evaluation technology of the invention can be applied to multipart documents, and to search engines and indices.
83 Citations
32 Claims
-
1. A method of evaluating characters in a message containing a plurality of character sets, comprising the steps of:
-
creating a character table bank having a plurality of columns for each character row, wherein the columns correspond in number to the plurality of character sets and include an indication of whether or not an associated character set is represented in the corresponding character row;
creating a mask, the mask comprising a number of mask columns that correspond to an equivalent number of columns in the character table bank, wherein the mask columns contain an indication of the character sets against which the characters of the message are to be evaluated;
accepting an input of the characters of the message;
evaluating a portion of the message by accessing the corresponding character row of the character table bank for each of a predetermined number of the characters of the message and performing a logical AND operation between each of the corresponding character rows and the mask;
filling a character match list with an entry for each of the character sets that result in a non-zero result after the logical AND operation; and
returning the character match list. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for evaluating characters in a message containing a plurality of character sets, comprising:
-
a character table bank having a plurality of columns, wherein the columns correspond in number to the plurality of character sets and include an indication of whether or not an associated character set is represented in the corresponding character row;
a mask, the mask comprising a number of mask columns that correspond to an equivalent number of columns in the character table bank, wherein the mask columns contain an indication of the character sets against which the characters in the message are to be evaluated;
an input interface to accept an input of the characters of the message;
a processor unit, connected to the input interface, the processor unit evaluating a portion of the message by accessing the corresponding character row of the character table bank for each of a predetermined number of the characters of the message and performing a logical AND operation between each of the corresponding character rows and the mask, filling a character match list with an entry for each of the character sets that result in a non-zero result after the logical AND operation; and
returning the character match list. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for evaluating characters in a message containing a plurality of character sets, comprising:
-
character table bank means, the character table bank means having a plurality of columns, wherein the columns correspond in number to the plurality of character sets and include an indication of whether or not an associated character set is represented in the corresponding character row;
mask means, the mask means comprising a number of mask columns that correspond to an equivalent number of columns in the character table bank, wherein the mask columns contain an indication of the character sets against which the characters in the message are to be evaluated;
input interface means to accept an input of the characters of the message;
processor means, connected to the input interface means, the processor means evaluating a portion of the message by accessing the corresponding character row of the character table bank for each of a predetermined number of the characters of the message and performing a logical AND operation between each of the corresponding character rows and the mask means, filling a character mach list means with an entry for each of the character sets that result in a non-zero result after the logical AND operation; and
returning the character match list means.- View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. The system of 23, wherein the multipart text comprises a MIME document.
-
25. A storage medium for storing machine readable code, the machine readable code being executable to evaluate characters in a message containing a plurality of character sets according to the steps of:
-
creating a character table bank having a plurality of columns, wherein the columns correspond in number to the plurality of character sets and include an indication of whether or not an associated character set is represented in the corresponding character row;
creating a mask, the mask comprising a number of mask columns that correspond to an equivalent number of columns in the character table bank, wherein the mask columns contain an indication of the character sets against which the characters of the message are to be evaluated;
accepting an input of the characters of the message;
evaluating a portion of the message by accessing the corresponding character row of the character table bank for each of a predetermined number of the characters of the message and performing a logical AND operation between each of the corresponding character rows and the mask;
filling a character match list with an entry for each of the character sets that result in a non-zero result after the logical AND operation; and
returning the character match list. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
Specification