Forward and reverse Boyer-Moore string searching of multilingual text having a defined collation order
First Claim
1. A computer system for searching first text string for an occurrence of a second text string, the first text string and the second text string consisting of characters belonging to a language which has a predefined character precedence, the computer system comprising:
- (a) a processor;
(b) a memory connected to the processor and containing locations for storing information including the first text string and the second text string;
(c) a program stored in the memory and controlling the processor to retrieve information stored in the memory, to manipulate the retrieved information and to store the manipulated information in the memory;
(d) means controlled by the program for defining at least one ordering value for each character based on the predefined character precedence of the language and for storing ordering values for all characters in the memory;
(e) means controlled by the program for preprocessing the first text string to generate a first index table comprising ordering values for each character in the first text string and a second index table comprising minimum shift values for shifting the second text string relative to the first text string in order to obtain a match;
(f) means responsive to values in the second index table for shifting the second text string relative to the first text string in either of a first direction or a second direction opposite to the first direction by an mount indicated by the second index table;
(g) means for performing a comparison of the ordering values of the characters in the second string to the ordering values in the first index table to determine whether the second text string occurs in the first text string.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for providing a language-sensitive text compare. An innovative system and method for performing the compare is presented that performs text comparison of any Unicode strings. For any language an ordering is defined based on features of the language. Then, an interactive compare function is performed to determine the relationship of a pair of strings. The string is examined and a compare is performed one character at a time based on a predefined character precedence.
423 Citations
4 Claims
-
1. A computer system for searching first text string for an occurrence of a second text string, the first text string and the second text string consisting of characters belonging to a language which has a predefined character precedence, the computer system comprising:
-
(a) a processor; (b) a memory connected to the processor and containing locations for storing information including the first text string and the second text string; (c) a program stored in the memory and controlling the processor to retrieve information stored in the memory, to manipulate the retrieved information and to store the manipulated information in the memory; (d) means controlled by the program for defining at least one ordering value for each character based on the predefined character precedence of the language and for storing ordering values for all characters in the memory; (e) means controlled by the program for preprocessing the first text string to generate a first index table comprising ordering values for each character in the first text string and a second index table comprising minimum shift values for shifting the second text string relative to the first text string in order to obtain a match; (f) means responsive to values in the second index table for shifting the second text string relative to the first text string in either of a first direction or a second direction opposite to the first direction by an mount indicated by the second index table; (g) means for performing a comparison of the ordering values of the characters in the second string to the ordering values in the first index table to determine whether the second text string occurs in the first text string. - View Dependent Claims (2)
-
-
3. A method for searching first text string for an occurrence of a second text string, the first text string and the second text string consisting of characters belonging to a language which has a predefined character precedence, the method operating in a computer system having a processor, a memory connected to the processor and containing locations for storing information including the first text string and the second text string and a program stored in the memory and controlling the processor to retrieve information stored in the memory, to manipulate the retrieved information and to store the manipulated information in the memory and comprising the steps of:
-
(a) defining at least one ordering value for each character based on the predefined character precedence of the language; (b) storing ordering values for all characters in the memory; (c) preprocessing the first text string to generate a first index table comprising ordering values for each character in the first text string and a second index table comprising minimum shift values for shifting the second text string relative to the first text string in order to obtain a match; (d) shifting the second text string relative to the first text string in either of a first direction or a second direction opposite to the first direction by an amount indicated by the second index table; and (e) performing a comparison of the ordering values of the characters in the second string to the ordering values in the first index table to determine whether the second text string occurs in the first text string. - View Dependent Claims (4)
-
Specification