SYSTEM AND METHOD FOR EVALUATING CHARACTER SETS TO GENERATE A SEARCH INDEX
First Claim
1. A method of evaluating characters in a message to generate a search index, comprising the steps of:
- a) accepting an input of the characters of the message;
b) evaluating the message by comparing the characters of the message to a predetermined set of candidate character sets to determine a match between the predetermined set of candidate character sets and the message; and
c) generating a search index based on the results of the evaluation of the message and candidate character sets.
1 Assignment
0 Petitions
Accused Products
Abstract
An evaluator system accepts input textual messages in unknown languages and assesses which character sets, corresponding to languages, matches that message. Textual messages whose individual characters are encoded in 16 bit Unicode or other universal format are parsed, and character sets which can express each character and the accumulated correspondence is logged. When the character sets against which the message is being tested only provide partial matches, the invention can determine which offers the best fit, including by means of a weighting function. The evaluation technology of the invention can be applied to multipart documents, and to search engines and indices. Documents can be indexed according to assigned character sets, and quary strings matched to indices according to language.
21 Citations
32 Claims
-
1. A method of evaluating characters in a message to generate a search index, comprising the steps of:
-
a) accepting an input of the characters of the message;
b) evaluating the message by comparing the characters of the message to a predetermined set of candidate character sets to determine a match between the predetermined set of candidate character sets and the message; and
c) generating a search index based on the results of the evaluation of the message and candidate character sets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for evaluating characters in a message to generate a search index, comprising:
-
an input interface to accept an input of the characters of the message; and
a processor unit, connected to the input interface, the processor unit evaluating the message by comparing the characters of the message to a predetermined set of candidate character sets to determine a match between the predetermined set of candidate character sets and the message, and generating a search index based on the results of the evaluation of the message and candidate character sets. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32)
-
-
17. A system for evaluating characters in a message to generate a search index, comprising:
-
input interface means to accept an input of the characters of the message; and
processor means, connected to the input interface means, the processor means evaluating the message by comparing the characters of the message to a predetermined set of candidate character sets to determine a match between the predetermined set of candidate character sets and the message, and generating a search index based on the results of the evaluation of the message and candidate character sets.
-
-
25. A storage medium for storing machine readable code, the machine readable code being executable to evaluate characters in an electronic message according to the steps of:
-
a) accepting an input of the characters of the message;
b) evaluating the message by comparing the characters of the message to a predetermined set of candidate character sets to determine a match between the predetermined set of candidate character sets and the message; and
c) generating a search index based on the results of the evaluation of the message and the candidate character sets.
-
Specification