Hybrid comparison for unicode text strings consisting primarily of ASCII characters
First Claim
1. A method of comparing text strings with Unicode encoding, comprising:
- at a computer having one or more processors, and memory storing one or more programs configured for execution by the one or more processors;
receiving a first text string S1 with Unicode encoding and a second text string S2 with Unicode encoding;
computing, for the first text string S1, a first string weight according to a weight function ƒ
that computes an ASCII prefix ƒ
A(S1), computes a Unicode weight suffix ƒ
U(S1), and concatenates the ASCII prefix to the Unicode weight suffix to form the first string weight ƒ
(S1)=ƒ
A(S1)+ƒ
U(S1), wherein;
computing the ASCII prefix for the first text string comprises applying bitwise operations to n-byte contiguous blocks of the first text string to determine whether each block contains only ASCII characters, and replacing accented Unicode characters in the first text string with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive, wherein n is a predefined integer greater than or equal to 4; and
computing the Unicode weight suffix comprises, when there is a first block containing a non-replaceable non-ASCII character, performing a character-by-character Unicode weight lookup beginning with the first block containing a non-replaceable non-ASCII character;
computing, for the second text string S2, a second string weight according to the weight function ƒ
; and
determining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight.
1 Assignment
0 Petitions
Accused Products
Abstract
Comparing text strings with Unicode encoding includes receiving two text strings S1 and S2. The process computes, for the first text string S1, a first weight according to a weight function ƒ that computes an ASCII prefix ƒA(S1), computes a Unicode weight suffix ƒU(S1), and concatenates the weights to form the first weight ƒ(S1)=ƒA(S1)+ƒU(S1). Computing the ASCII prefix for the first string applies bitwise operations to n-byte contiguous blocks of the first string to determine whether each block contains only ASCII characters, and replaces accented Unicode characters with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive. When there is a first block containing a non-replaceable non-ASCII character, the Unicode weight suffix is computed by performing a character-by-character Unicode weight lookup beginning with the first block. The same process is applied to the second string. The text string are compared by comparing their computed weights.
-
Citations
20 Claims
-
1. A method of comparing text strings with Unicode encoding, comprising:
-
at a computer having one or more processors, and memory storing one or more programs configured for execution by the one or more processors; receiving a first text string S1 with Unicode encoding and a second text string S2 with Unicode encoding; computing, for the first text string S1, a first string weight according to a weight function ƒ
that computes an ASCII prefix ƒ
A(S1), computes a Unicode weight suffix ƒ
U(S1), and concatenates the ASCII prefix to the Unicode weight suffix to form the first string weight ƒ
(S1)=ƒ
A(S1)+ƒ
U(S1), wherein;computing the ASCII prefix for the first text string comprises applying bitwise operations to n-byte contiguous blocks of the first text string to determine whether each block contains only ASCII characters, and replacing accented Unicode characters in the first text string with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive, wherein n is a predefined integer greater than or equal to 4; and computing the Unicode weight suffix comprises, when there is a first block containing a non-replaceable non-ASCII character, performing a character-by-character Unicode weight lookup beginning with the first block containing a non-replaceable non-ASCII character; computing, for the second text string S2, a second string weight according to the weight function ƒ
; anddetermining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device, comprising:
-
one or more processors; memory; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for; receiving a first text string S1 with Unicode encoding and a second text string S2 with Unicode encoding; computing, for the first text string S1, a first string weight according to a weight function ƒ
that computes an ASCII prefix ƒ
A(S1), computes a Unicode weight suffix ƒ
U(S1), and concatenates the ASCII prefix to the Unicode weight suffix to form the first string weight ƒ
(S1)=ƒ
A(S1)+ƒ
U(S1), wherein;computing the ASCII prefix for the first text string comprises applying bitwise operations to n-byte contiguous blocks of the first text string to determine whether each block contains only ASCII characters, and replacing accented Unicode characters in the first text string with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive, wherein n is a predefined integer greater than or equal to 4; and computing the Unicode weight suffix comprises, when there is a first block containing a non-replaceable non-ASCII character, performing a character-by-character Unicode weight lookup beginning with the first block containing a non-replaceable non-ASCII character; computing, for the second text string S2, a second string weight according to the weight function ƒ
; anddetermining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computing device having one or more processors and memory, the one or more programs comprising instructions for:
-
receiving a first text string S1 with Unicode encoding and a second text string S2 with Unicode encoding; computing, for the first text string S1, a first string weight according to a weight function ƒ
that computes an ASCII prefix ƒ
A(S1), computes a Unicode weight suffix ƒ
U(S1), and concatenates the ASCII prefix to the Unicode weight suffix to form the first string weight ƒ
(S1)=ƒ
A(S1)+ƒ
U(S1), wherein;computing the ASCII prefix for the first text string comprises applying bitwise operations to n-byte contiguous blocks of the first text string to determine whether each block contains only ASCII characters, and replacing accented Unicode characters in the first text string with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive, wherein n is a predefined integer greater than or equal to 4; and computing the Unicode weight suffix comprises, when there is a first block containing a non-replaceable non-ASCII character, performing a character-by-character Unicode weight lookup beginning with the first block containing a non-replaceable non-ASCII character; computing, for the second text string S2, a second string weight according to the weight function ƒ
; anddetermining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight.
-
Specification