Apparatus, system and method for application-specific and customizable semantic similarity measurement
First Claim
1. A method for application-specific and customizable text similarity measurement, the method comprising the steps of:
- determining, by a computer, a string similarity score of at least two texts based upon a string similarity database stored on said computer, said at least two texts comprising at least one input text and at least one target text;
determining, by said computer, a semantic similarity score of the at least two texts based upon a semantic similarity database stored on said computer, the semantic similarity score being determined as the sum of a distance between at least one term of each said at least two texts;
mapping, by said computer, said at least one target text and its respective canonical representations in a mappings database stored on said computer;
combining, by said computer, the string similarity score and the semantic similarity score of the at least two texts where the combined score is a weighted sum of the string similarity score and the semantic similarity score and where said at least two texts are ranked for similarity by sorting by their respective combined string and semantic similarity scores and where said texts that are included in the mappings database are also scored by similarity of their canonical forms; and
producing, by the computer, an output signal defining a list of best-scoring semantic mappings with associated scores, said computer adapted to transform said output signal into a display signal viewable on a monitor.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to an apparatus system and method for creating a customizable and application-specific semantic similarity utility that uses a single similarity measuring algorithm with data from broad-coverage structured lexical knowledge bases (dictionaries and thesauri) and corpora (document collections). More specifically the invention includes the use of data from custom or application-specific structured lexical knowledge bases and corpora and semantic mappings from variant expressions to their canonical forms. The invention uses a combination of technologies to simplify the development of a generic semantic similarity utility; and minimize the effort and complexity of customizing the generic utility for a domain- or topic-dependent application. The invention makes customization modular and data-driven, allowing developers to create implementations at varying degrees of customization (e.g., generic, domain-level, company-level, application-level) and also as changes occur over time (e.g., when product and service mixes change).
29 Citations
58 Claims
-
1. A method for application-specific and customizable text similarity measurement, the method comprising the steps of:
-
determining, by a computer, a string similarity score of at least two texts based upon a string similarity database stored on said computer, said at least two texts comprising at least one input text and at least one target text; determining, by said computer, a semantic similarity score of the at least two texts based upon a semantic similarity database stored on said computer, the semantic similarity score being determined as the sum of a distance between at least one term of each said at least two texts; mapping, by said computer, said at least one target text and its respective canonical representations in a mappings database stored on said computer; combining, by said computer, the string similarity score and the semantic similarity score of the at least two texts where the combined score is a weighted sum of the string similarity score and the semantic similarity score and where said at least two texts are ranked for similarity by sorting by their respective combined string and semantic similarity scores and where said texts that are included in the mappings database are also scored by similarity of their canonical forms; and producing, by the computer, an output signal defining a list of best-scoring semantic mappings with associated scores, said computer adapted to transform said output signal into a display signal viewable on a monitor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. A data processing system, comprising a processor, a memory coupled to the processor, and a non-transitory computer readable storage device coupled to the processor, said storage device containing program code executable by the processor via the memory to implement a method for application-specific and customizable text similarity measurement, the processing system comprising:
-
first logic executable by the processor for determining a string similarity score of at least two texts based upon a string similarity database, said at least two texts comprises at least one input text and at least one target text; second logic executable by the processor for determining a semantic similarity score of the at least two texts based upon a semantic similarity database, the semantic similarity score being determined as the sum of a distance between at least one term of each said at least two texts; third logic executable by the processor for mapping said at least one target text and its respective canonical representations in a mappings database stored in said memory; fourth logic executable by the processor for combining the string similarity score and the semantic similarity score of the at least two texts where the combined score is a weighted sum of the string similarity score and the semantic similarity score and where said at least two texts are ranked for similarity by sorting by their respective combined string and semantic similarity scores and where said texts that are included in the mappings database are also scored by similarity of their canonical forms; and an output signal defining a list of best-scoring semantic mappings with associated scores, said program code executable to transform said output signal into a display signal viewable on a monitor.
-
-
58. A system, comprising a processor, a memory coupled to the processor, and a non-transitory computer readable storage device coupled to the processor, said storage device containing program code executable by the processor via the memory to implement a method useful for application-specific and customizable text similarity measurement, the system comprising:
-
a first computer store containing at least two texts defining string similarity of each of said texts, said at least two texts comprising at least one input text and at least one target text, where said defined string similarity is stored as a string similarity score in said first computer store; a second computer store interconnected to the first computer store and housing a semantic similarity score of the at least two texts based the first computer store, where the semantic similarity score is determined as the sum of a distance between at least one term of each said at least two texts; a third computer store interconnected to the first and second computers stores where said at least one target text and its respective canonical representations are mapped and housed in said third computer store; and an output signal defining a list of best-scoring semantic mappings with associated scores, said program code executable to transform said output signal into a display signal viewable on a monitor, wherein the string similarity score and the semantic similarity score of the at least two texts where the combined score is a weighted sum of the string similarity score and the semantic similarity score and where said at least two texts are ranked for similarity by sorting by their respective combined string and semantic similarity scores and where said texts that are included in the mappings database are also scored by similarity of their canonical forms.
-
Specification