Hybrid comparison for unicode text strings consisting primarily of ASCII characters
First Claim
1. A method of comparing text strings having Unicode encoding, comprising:
- at a computer having one or more processors, and memory storing one or more programs configured for execution by the one or more processors;
receiving a first text string S=s1 s2 . . . sn having Unicode encoding and a second text string T=t1 t2 . . . tm having Unicode encoding, wherein n and m are positive integers, and s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters;
computing, for the first text string S, a first string weight ƒ
(S) according to a weight function ƒ
, computed according to;
when it is determined that S consists entirely of ASCII characters, ƒ
(S)=S;
when it is determined that S consists of ASCII characters and one or more accented ASCII characters that are replaceable by corresponding ASCII characters, ƒ
(S)=g(s1) g(s2) . . . g(sn), wherein g(si)=si when si is an ASCII character and g(si)=si′
when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′
; and
when S includes one or more non-replaceable non-ASCII characters, the first string weight ƒ
(S) is a concatenation of an ASCII weight prefix ƒ
A(S) and a Unicode weight suffix ƒ
U(S);
computing, a second string weight ƒ
(T), for the second text string T, according to the weight function ƒ
; and
determining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight.
0 Assignments
0 Petitions
Accused Products
Abstract
A method compares text strings having Unicode encoding. The method receives a first string S=s1 s2 . . . sn and a second string T=t1 t2 . . . tm, where s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters. The method computes a first string weight for the first string S according to a weight function ƒ. When S consists of ASCII characters, ƒ(S)=S. When S consists of ASCII characters and some accented ASCII characters that are replaceable by ASCII characters, ƒ(S)=g(s1) g(s2) . . . g(sn), where g(si)=si when si is an ASCII character and g(si)=si′ when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′. When S includes one or more non-replaceable non-ASCII characters, the first string weight concatenates an ASCII weight prefix ƒA (S) and a Unicode weight suffix ƒU(S). The method also computes a second string weight for the second text string T. Equality of the strings is tested using the string weights.
-
Citations
20 Claims
-
1. A method of comparing text strings having Unicode encoding, comprising:
-
at a computer having one or more processors, and memory storing one or more programs configured for execution by the one or more processors; receiving a first text string S=s1 s2 . . . sn having Unicode encoding and a second text string T=t1 t2 . . . tm having Unicode encoding, wherein n and m are positive integers, and s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters; computing, for the first text string S, a first string weight ƒ
(S) according to a weight function ƒ
, computed according to;when it is determined that S consists entirely of ASCII characters, ƒ
(S)=S;when it is determined that S consists of ASCII characters and one or more accented ASCII characters that are replaceable by corresponding ASCII characters, ƒ
(S)=g(s1) g(s2) . . . g(sn), wherein g(si)=si when si is an ASCII character and g(si)=si′
when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′
; andwhen S includes one or more non-replaceable non-ASCII characters, the first string weight ƒ
(S) is a concatenation of an ASCII weight prefix ƒ
A(S) and a Unicode weight suffix ƒ
U(S);computing, a second string weight ƒ
(T), for the second text string T, according to the weight function ƒ
; anddetermining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device, comprising:
-
one or more processors; memory; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for; receiving a first text string S=s1 s2 . . . sn having Unicode encoding and a second text string T=t1 t2 . . . tm having Unicode encoding, wherein n and m are positive integers, and s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters; computing, for the first text string S, a first string weight ƒ
(S) according to a weight function ƒ
, computed according to;when it is determined that S consists entirely of ASCII characters, ƒ
(S)=S;when it is determined that S consists of ASCII characters and one or more accented ASCII characters that are replaceable by corresponding ASCII characters, ƒ
(S)=g(s1) g(s2) . . . g(sn), wherein g(si)=si when si is an ASCII character and g(si)=si′
when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′
; andwhen S includes one or more non-replaceable non-ASCII characters, the first string weight ƒ
(S) is a concatenation of an ASCII weight prefix ƒ
A(S) and a Unicode weight suffix ƒ
U (S);computing, a second string weight ƒ
(T), for the second text string T, according to the weight function ƒ
; anddetermining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computing device having one or more processors and memory, the one or more programs comprising instructions for:
-
receiving a first text string S=s1 s2 . . . sn having Unicode encoding and a second text string T=t1 t2 . . . tm having Unicode encoding, wherein n and m are positive integers, and s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters; computing, for the first text string S, a first string weight ƒ
(S) according to a weight function ƒ
, computed according to;when it is determined that S consists entirely of ASCII characters, ƒ
(S)=S;when it is determined that S consists of ASCII characters and one or more accented ASCII characters that are replaceable by corresponding ASCII characters, ƒ
(S)=g(s1) g(s2) . . . g(sn), wherein g(si)=si when si is an ASCII character and g(si)=si′
when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′
; andwhen S includes one or more non-replaceable non-ASCII characters, the first string weight ƒ
(S) is a concatenation of an ASCII weight prefix ƒ
A(S) and a Unicode weight suffix ƒ
U(S);computing, a second string weight ƒ
(T), for the second text string T, according to the weight function ƒ
; anddetermining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight.
-
Specification