×

Hybrid comparison for unicode text strings consisting primarily of ASCII characters

  • US 10,089,281 B1
  • Filed: 09/28/2017
  • Issued: 10/02/2018
  • Est. Priority Date: 11/06/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method of comparing text strings with Unicode encoding, comprising:

  • at a computer having one or more processors, and memory storing one or more programs configured for execution by the one or more processors;

    receiving a first text string S1 with Unicode encoding and a second text string S2 with Unicode encoding;

    computing, for the first text string S1, a first string weight according to a weight function ƒ

    that computes an ASCII prefix ƒ

    A(S1), computes a Unicode weight suffix ƒ

    U(S1), and concatenates the ASCII prefix to the Unicode weight suffix to form the first string weight ƒ

    (S1)=ƒ

    A(S1)+ƒ

    U(S1), wherein;

    computing the ASCII prefix for the first text string comprises applying bitwise operations to n-byte contiguous blocks of the first text string to determine whether each block contains only ASCII characters, and replacing accented Unicode characters in the first text string with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive, wherein n is a predefined integer greater than or equal to 4; and

    computing the Unicode weight suffix comprises, when there is a first block containing a non-replaceable non-ASCII character, performing a character-by-character Unicode weight lookup beginning with the first block containing a non-replaceable non-ASCII character;

    computing, for the second text string S2, a second string weight according to the weight function ƒ

    ; and

    determining whether the first text string and the second text string are equal by comparing the first string weight to the second string weight.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×