×

System and method for utilizing multiple encodings to identify similar language characters

  • US 9,128,915 B2
  • Filed: 08/03/2012
  • Issued: 09/08/2015
  • Est. Priority Date: 08/03/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for improving accuracy of data matching in a middleware machine environment by identifying a similarity between language characters of a character set of a language, wherein each language character has a unique structure, the method comprising:

  • providing a language character match engine, wherein the language character match engine executes on one or more microprocessor, wherein the language character match engine comprises a plurality of encoding components, including at least a first encoding component and a second encoding component and a third encoding component;

    using the language character match engine to generate a composite similarity score set for the character set of the language wherein said similarity index comprises a composite similarity score for each of a plurality of pairs of language characters of the character set of the language;

    wherein the composite similarity score for each of the plurality of pairs of language characters is prepared by,receiving the pair of language characters with the language character match engine,using the first encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a first-encoded string of identification characters representing the unique structure of the language character,comparing the first-encoded strings of identification characters for each of the pair of language characters to one another to generate a first-encoding similarity score for the pair of language characters,using the second encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a second-encoded string of identification characters representing the unique structure of the language character,comparing the second-encoded strings of identification characters for each of the pair of language characters to one another to generate a second-encoding similarity score for the pair of language characters,using the third encoding component to encode each language character of the pair of language characters based on the unique structure of each language character and generate, for each language character, a third-encoded string of identification characters representing the unique structure of the language character,comparing the third-encoded strings of identification characters for each of the pair of language characters to one another to generate a third-encoding similarity score for the pair of language characters, andcombining the first-encoding similarity score, the second-encoding similarity score, and the third-encoding similarity score for the pair of language characters to generate a composite similarity score for the pair of language characters.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×