Language-sensitive text searching system with modified Boyer-Moore process
First Claim
1. A method for establishing a collating order between a first text string and a second text string, the first text string and the second text string consisting of characters belonging to a language which has a predefined character precedence, the method being operable in a computer system having a processor and a memory connected to the processor and containing locations for storing information including the first text string and the second text string and comprising the steps of:
- (a) defining at least one ordering value for each character based on the predefined character precedence the language;
(b) storing ordering values for all characters in the memory;
(c) retrieving from the memory pairs of characters including a first character from the first text string and a second character from the second text string;
(d) retrieving an ordering value for the first character and an ordering value for the second character;
(e) performing a comparison of the retrieved ordering values of the characters to determine a difference between the first and the second text string; and
(f) computing a minimum trailing match length value for each position in the second text string.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for providing a language-sensitive text search that performs text comparison of any Unicode strings. For any language an ordering is defined based on features of the language. Then, an interactive compare function is performed to determine the relationship of a pair of strings. The string is examined and a compare is performed one or more characters at a time based on a predefined character precedence.
458 Citations
24 Claims
-
1. A method for establishing a collating order between a first text string and a second text string, the first text string and the second text string consisting of characters belonging to a language which has a predefined character precedence, the method being operable in a computer system having a processor and a memory connected to the processor and containing locations for storing information including the first text string and the second text string and comprising the steps of:
-
(a) defining at least one ordering value for each character based on the predefined character precedence the language; (b) storing ordering values for all characters in the memory; (c) retrieving from the memory pairs of characters including a first character from the first text string and a second character from the second text string; (d) retrieving an ordering value for the first character and an ordering value for the second character; (e) performing a comparison of the retrieved ordering values of the characters to determine a difference between the first and the second text string; and (f) computing a minimum trailing match length value for each position in the second text string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer system for searching a first text string for an occurrence of a second text string, the first text string and the second text string consisting of characters belonging to a language which has a predefined character precedence, the computer system comprising:
-
(a) a processor; (b) a memory connected to the processor and containing locations for storing information including the first text string and the second text string; (c) a program stored in the memory and controlling the processor to retrieve information stored in the memory, to manipulate the retrieved information and to store the manipulated information in the memory; (d) means controlled by the program for defining at least one ordering value for each character based on the predefined character precedence of the language and for storing ordering values for all characters in the memory; (e) means controlled by the program for preprocessing the first text string to generate a first index table comprising ordering values for each character in the first text string; (f) means controlled by the program for computing a minimum trailing match length for each position in the first text string, wherein the minimum trailing match length value corresponds to a minimum length of the second text string when the second text string matches trailing elements at the end of the first text string; (d) means responsive to the minimum trailing match length values for shifting the second text string relative to the first text string in a first direction by an amount indicated by the minimum trailing match length values; and (h) means for performing a comparison of the ordering values of the characters in the second string to the ordering values in the first index table to determine whether the second text string occurs in the first text string. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification