System and method for utilizing multiple encodings to identify similar language characters
First Claim
1. A method for improving accuracy of data matching in a middleware machine environment by identifying a similarity between language characters of a character set of a language, wherein each language character has a unique structure, the method comprising:
- providing a language character match engine, wherein the language character match engine executes on one or more microprocessor, wherein the language character match engine comprises a plurality of encoding components, including at least a first encoding component and a second encoding component and a third encoding component;
using the language character match engine to generate a composite similarity score set for the character set of the language wherein said similarity index comprises a composite similarity score for each of a plurality of pairs of language characters of the character set of the language;
wherein the composite similarity score for each of the plurality of pairs of language characters is prepared by,receiving the pair of language characters with the language character match engine,using the first encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a first-encoded string of identification characters representing the unique structure of the language character,comparing the first-encoded strings of identification characters for each of the pair of language characters to one another to generate a first-encoding similarity score for the pair of language characters,using the second encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a second-encoded string of identification characters representing the unique structure of the language character,comparing the second-encoded strings of identification characters for each of the pair of language characters to one another to generate a second-encoding similarity score for the pair of language characters,using the third encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a third-encoded string of identification characters representing the unique structure of the language character,comparing the third-encoded strings of identification characters for each of the pair of language characters to one another to generate a third-encoding similarity score for the pair of language characters, andcombining the first-encoding similarity score, the second-encoding similarity score, and the third-encoding similarity score for the pair of language characters to generate a composite similarity score for the pair of language characters.
2 Assignments
0 Petitions
Accused Products
Abstract
Described herein are systems and methods for identifying the similarity between language characters. As described herein, a pair of language characters is received at a language character match engine. The language character match engine is adapted to receive encoding configuration information from each of a plurality of encoding components, and is adapted to encode the pair of language characters based on the unique structure of each language character to generate a pair of string identification characters for each encoding component. Thereafter, each pair of string identification characters is compared to one another to generate a similarity score, and the similarity score for each pair of string identification characters is combined to create a composite similarity score. The composite similarity score represents a similarity between the pair of language characters, and is used to identify the similarity between the pair of language characters.
37 Citations
20 Claims
-
1. A method for improving accuracy of data matching in a middleware machine environment by identifying a similarity between language characters of a character set of a language, wherein each language character has a unique structure, the method comprising:
-
providing a language character match engine, wherein the language character match engine executes on one or more microprocessor, wherein the language character match engine comprises a plurality of encoding components, including at least a first encoding component and a second encoding component and a third encoding component; using the language character match engine to generate a composite similarity score set for the character set of the language wherein said similarity index comprises a composite similarity score for each of a plurality of pairs of language characters of the character set of the language; wherein the composite similarity score for each of the plurality of pairs of language characters is prepared by, receiving the pair of language characters with the language character match engine, using the first encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a first-encoded string of identification characters representing the unique structure of the language character, comparing the first-encoded strings of identification characters for each of the pair of language characters to one another to generate a first-encoding similarity score for the pair of language characters, using the second encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a second-encoded string of identification characters representing the unique structure of the language character, comparing the second-encoded strings of identification characters for each of the pair of language characters to one another to generate a second-encoding similarity score for the pair of language characters, using the third encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a third-encoded string of identification characters representing the unique structure of the language character, comparing the third-encoded strings of identification characters for each of the pair of language characters to one another to generate a third-encoding similarity score for the pair of language characters, and combining the first-encoding similarity score, the second-encoding similarity score, and the third-encoding similarity score for the pair of language characters to generate a composite similarity score for the pair of language characters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer readable storable medium storing instructions thereon for improving accuracy of data matching in a middleware machine environment by identifying a similarity between language characters of a language, wherein each language character has a unique structure, which instructions, when processed in a middleware machine of said middleware machine environment, cause the middleware machine to perform steps comprising:
using the language character match engine to generate a composite similarity score set for the character set of the language wherein said similarity index comprises a composite similarity score for each of a plurality of pairs of language characters of the character set of the language, and wherein the composite similarity score for each of the plurality of pairs of language characters is prepared by, receiving the pair of language characters with a character match engine, using a first encoding component of the character match engine to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a first-encoded string of identification characters representing the unique structure of the language character, comparing the first-encoded strings of identification characters for each of the pair of language characters to one another to generate a first-encoding similarity score for the pair of language characters, using a second encoding component of the character match engine to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a second-encoded string of identification characters representing the unique structure of the language character, comparing the second-encoded strings of identification characters for each of the pair of language characters to one another to generate a second-encoding similarity score for the pair of language characters, using a third encoding component of the character match engine to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a third-encoded string of identification characters representing the unique structure of the language character, comparing the third-encoded strings of identification characters for each of the pair of language characters to one another to generate a third-encoding similarity score for the pair of language characters, and combining the first-encoding similarity score, the second-encoding similarity score, and the third-encoding similarity score for the pair of language characters to generate a composite similarity score for the pair of language characters. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
18. A system for generating a similarity index identifying a similarity between language characters of a language, wherein each language character has a unique structure, the system comprising:
-
a computer system comprising a microprocessor and a memory and a language character match engine, wherein said language character match engine comprises a plurality of encoding components for encoding a plurality of pairs of language characters of the language based on the unique structure of each language character; a first encoding component of the language character match engine which is configured to encode each language character of each of said plurality of pairs of language characters based on the unique structure of each language character, generate a first-encoded string of identification characters representing the unique structure of each language character, and compare the first-encoded strings of identification characters generated for each language character to one another to generate a first-encoding similarity score for each of the plurality of pairs of language characters; a second encoding component of the language character match engine which is configured to encode each language character of each of said plurality of pairs of language characters based on the unique structure of each language character, generate a second-encoded string of identification characters representing the unique structure of each language character, and compare the second-encoded strings of identification characters generated for each language character to one another to generate a second-encoding similarity score for each of the plurality of pairs of language characters; a third encoding component of the language character match engine which is configured to, encode each language character of each of said plurality of pairs of language characters based on the unique structure of each language character, generate a third-encoded string of identification characters representing the unique structure of each language character, and compare the third-encoded strings of identification characters generated for each language character to one another to generate a third-encoding similarity score for each of the plurality of pairs of language characters; wherein said language character match engine is configured to create a composite similarity score set for the character set of the language by receiving each of said plurality of pairs of language characters, and combining the first-encoding similarity score, the second-encoding similarity score and the third-encoding similarity score for each of the plurality of pairs of language characters to compute a composite similarity score for each of the plurality of pairs of language characters. - View Dependent Claims (19, 20)
-
Specification