Method and apparatus for comparison of data strings
First Claim
1. In a computer system having a processing means coupled to a storage means and a display means, a computer-implemented method of comparing, in a database application, a first string of digitally represented characters with a second string of digitally represented characters, each character of said first and second string of digitally represented characters being a member of a set of digitally represented characters, said computer-implemented method comprising the steps of:
- using said processing means, normalizing said first string of S known digitally represented characters to create a first normalized string of N normalized symbols and normalizing said second string of known digitally represented characters to create a second normalized string of N normalized symbols;
storing said first normalized string and said second normalized string in said storage means;
using said processing means, generating a first projection from said first normalized string and a second projection from said second normalized string;
storing said first projection and said second projection in said storage means; and
using said processing means, retrieving said first and second projections from said storage means and comparing said first projection and said second projection to determine a degree of similarity of said first and second projections.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is a method and apparatus that measures the similarity of two images. Any information that can be discretely symbolized can be transformed into an image through so-called "image projection". This process is used to define otherwise discrete entities as part of a linear space, making it possible to calculate distances among those entities. A mechanism called a cluster allows association of otherwise discrete symbols, improving the matching abilities of the invention. Initially, the sequence of symbols is normalized. Then, a projection of the normalized sequence is created. The projection may be optionally generated with a cluster that assigns weights to the neighbors of a core symbol and/or with position weights that assigns weights to each position in the normalized image. Projection matching is then performed to determine match candidates for the string of symbols.
-
Citations
62 Claims
-
1. In a computer system having a processing means coupled to a storage means and a display means, a computer-implemented method of comparing, in a database application, a first string of digitally represented characters with a second string of digitally represented characters, each character of said first and second string of digitally represented characters being a member of a set of digitally represented characters, said computer-implemented method comprising the steps of:
-
using said processing means, normalizing said first string of S known digitally represented characters to create a first normalized string of N normalized symbols and normalizing said second string of known digitally represented characters to create a second normalized string of N normalized symbols; storing said first normalized string and said second normalized string in said storage means; using said processing means, generating a first projection from said first normalized string and a second projection from said second normalized string; storing said first projection and said second projection in said storage means; and using said processing means, retrieving said first and second projections from said storage means and comparing said first projection and said second projection to determine a degree of similarity of said first and second projections. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
11. The computer-implemented method of claim 6 wherein said step of generating said projection of said first string and said second string using cluster tables and weight tables is accomplished by:
-
space="preserve" listing-type="equation">C.sub.s.sbsb.i.sub.M(S.sbsb.i.sub.)+|D|+j =d.sub.j *W.sub.M(s.sbsb.i.sub.) *us.sub.i.spsb.n
space="preserve" listing-type="equation">C.sub.s.sbsb.i.sub.M(S.sbsb.i.sub.)+|D|-j =d.sub.j *W.sub.M(s.sub.i.sub.) *us.sub.i.spsb.n (j=0,1,2, . . . , |D|-1)where D is a distributing series, |D| is distribution size, dj is the j-th item in distribution series D, CS.sbsb.ik is the k-th item in symbol Si '"'"'s closure, us.sbsb.in is a cluster weight, and WM(s.sbsb.i.sub.) is a position weight.
-
-
-
12. A computer apparatus for comparing, in a database application, a first string of digitally represented characters with a second string digitally represented characters, each character of said first and second string of digitally represented characters being a member of a set of digitally represented characters, said computer apparatus comprising:
-
a storage means; a processing means coupled to said storage means, said processing means including; a means for normalizing a first string of S known digitally represented characters to create a first normalized string of N normalized symbols and for normalizing a second string of known digitally represented characters to create a second normalized string of N normalized symbols; a means for storing said first normalized string and said second normalized string; a means for generating a first projection from said first normalized string and a second projection from said second normalized string; a means for storing said first projection and said second projection in said storage means; a means for retrieving said first and second projections from said storage means; and a means for comparing said first projection and said second projection to determine a degree of similarity of said first and second projections. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-readable medium having stored thereon a plurality of sequences of instructions for a database application, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause the processor to perform the steps of:
-
normalizing a first string of S known digitally represented characters to create a first normalized string of N normalized symbols and normalize a second string of known digitally represented characters to create a second normalized string of N normalized symbols; storing said first normalized string and said second normalized string; generating a first projection from said first normalized string and a second projection from said second normalized string; causing said processing means to store said first projection and said second projection; retrieving said first and second projections from said storage means; and comparing said first projection and said second projection to determine a degree of similarity of said first and second projections. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. In a computer system having a processing means coupled to a storage means and a display means, a computer-implemented method of comparing, in a spell checking application, a first string of digitally represented characters with a second string of digitally represented characters, each character of said first and second string of digitally represented characters being a member of a set of digitally represented characters, said computer-implemented method comprising the steps of:
-
using said processing means, normalizing said first string of S known digitally represented characters to create a first normalized string of N normalized symbols and normalizing said second string of known digitally represented characters to create a second normalized string of N normalized symbols; storing said first normalized string and said second normalized string in said storage means; using said processing means, generating a first projection from said first normalized string and a second projection from said second normalized string; storing said first projection and said second projection in said storage means; and using said processing means, retrieving said first and second projections from said storage means and comparing said first projection and said second projection to determine a degree of similarity of said first and second projections. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
42. The computer-implemented method of claim 37 wherein said step of generating said projection of said first string and said second string using cluster tables and weight tables is accomplished by:
-
space="preserve" listing-type="equation">C.sub.s.sbsb.i.sub.M(S.sbsb.i.sub.)+|D|+j =d.sub.j *W.sub.M(s.sbsb.i.sub.) *us.sub.i.spsb.n
space="preserve" listing-type="equation">C.sub.s.sbsb.i.sub.M(S.sbsb.i.sub.)+|D|-j =d.sub.j *W.sub.M(s.sbsb.i.sub.) *us.sub.i.spsb.n (j=0,1,2, . . . , |D|-1)where D is a distributing series, |D| is distribution size, dj is the j-th item in distribution series D, CS.sbsb.ik is the k-th item in symbol Si '"'"'s closure, us.sbsb.in is a cluster weight, and WM(s.sbsb.i.sub.) is a position weight.
-
-
-
43. A computer apparatus for comparing, in a spell checking application, a first string of digitally represented characters with a second string of digitally represented characters, each character of said first and second string of digitally represented characters being a member of a set of digitally represented characters, said computer apparatus comprising:
-
a storage means; a processing means coupled to said storage means, said processing means including; a means for normalizing a first string of S known digitally represented characters to create a first normalized string of N normalized symbols and for normalizing a second string of known digitally represented characters to create a second normalized string of N normalized symbols; a means for storing said first normalized string and said second normalized string; a means for generating a first projection from said first normalized string and a second projection from said second normalized string; a means for storing said first projection and said second projection in said storage means; a means for retrieving said first and second projections from said storage means; and a means for comparing said first projection and said second projection to determine a degree of similarity of said first and second projections. - View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52, 53)
-
-
54. A computer-readable medium having stored thereon a plurality of sequences of instructions for a spell checking application, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause the processor to perform the steps of:
-
normalizing a first string of S known digitally represented characters to create a first normalized string of N normalized symbols and normalize a second string of known digitally represented characters to create a second normalized string of N normalized symbols; storing said first normalized string and said second normalized string; generating a first projection from said first normalized string and a second projection from said second normalized string; causing said processing means to store said first projection and said second projection; retrieving said first and second projections from said storage means; and comparing said first projection and said second projection to determine a degree of similarity of said first and second projections. - View Dependent Claims (55, 56, 57, 58, 59, 60, 61, 62)
-
Specification