METHODS FOR OBTAINING IMPROVED TEXT SIMILARITY MEASURES
First Claim
1. A method for data integration between at least two electronic documents, said method comprising:
- identifying similar terms between said electronic documents, said identifying comprising basing similarity between said similar terms on patterns;
replacing said similar terms with an equivalent term to create transformed electronic documents;
performing a similarity comparison on said transformed electronic documents, comprising identifying transformed electronic documents comprising said equivalent term; and
outputting a unified view to a user, such that said unified view comprises said transformed electronic documents comprising said equivalent term.
1 Assignment
0 Petitions
Accused Products
Abstract
The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.
-
Citations
20 Claims
-
1. A method for data integration between at least two electronic documents, said method comprising:
-
identifying similar terms between said electronic documents, said identifying comprising basing similarity between said similar terms on patterns; replacing said similar terms with an equivalent term to create transformed electronic documents; performing a similarity comparison on said transformed electronic documents, comprising identifying transformed electronic documents comprising said equivalent term; and outputting a unified view to a user, such that said unified view comprises said transformed electronic documents comprising said equivalent term. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for data integration between at least two electronic documents, said method comprising:
-
identifying similar terms between said electronic documents, said identifying comprising basing similarity between said similar terms on patterns, wherein said patterns comprise at least one of word patterns, letter patterns, numeric patterns, and alphanumeric patterns; replacing said similar terms with an equivalent term to create transformed electronic documents; performing a similarity comparison on said transformed electronic documents, comprising identifying transformed electronic documents comprising said equivalent term; and outputting a unified view to a user, such that said unified view comprises said transformed electronic documents comprising said equivalent terms. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method for data integration between at least two electronic documents, said method comprising:
-
identifying similar terms between said electronic documents, said identifying comprising basing similarity between said similar terms on patterns, said basing of said similarity on patterns comprising identifying terms within said electronic documents that are within a category of a hierarchy, wherein said patterns comprise at least one of word patterns, letter patterns, numeric patterns, and alphanumeric patterns; replacing said similar terms with equivalent terms to create transformed electronic documents, said replacing of said similar terms comprising replacing said terms with a generic domain name describing said category; performing a similarity comparison on said transformed electronic documents, comprising identifying transformed electronic documents comprising said equivalent terms; and outputting a unified view to a user, such that said unified view comprises said transformed electronic documents comprising said equivalent terms. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer program product comprising computer readable program code stored on computer readable storage medium embodied therein for performing a method of measuring similarity between at least two electronic documents, said method comprising:
-
identifying similar terms between said electronic documents, said identifying comprising basing similarity between said similar terms on patterns; replacing said similar terms with an equivalent term to create transformed electronic documents; performing a similarity comparison on said transformed electronic documents, comprising identifying transformed electronic documents comprising said equivalent term; and outputting a unified view to a user, such that said unified view comprises said transformed electronic documents comprising said equivalent term.
-
Specification