×

Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree

  • US 7,945,525 B2
  • Filed: 11/09/2007
  • Issued: 05/17/2011
  • Est. Priority Date: 11/09/2007
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for data integration between electronic documents, said method comprising:

  • identifying, by a computer, first similar terms between said electronic documents, each of said first similar terms referring to a same kind of entity and comprising an alphanumeric string, including a first number of letter characters followed by a second number of numeric characters;

    replacing, by said computer, each of said first similar terms with a single equivalent term that replaces one or more of said second number of numeric characters with a common string pattern representation to create transformed electronic documents;

    identifying, by said computer, second similar terms between said two documents, each of said second similar terms belonging to a different level of a same hierarchical semantic data tree, wherein a lower node of said hierarchical semantic data tree represent a specific term and a higher node represents a general term;

    replacing, by said computer, said specific term of said lower node of said two documents with said general term of said higher node to create transformed electronic documents;

    performing, by said computer, a similarity comparison on said transformed electronic documents; and

    outputting, by said computer, a unified view to a user, such that said unified view comprises said transformed electronic documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×