Method and System for Determining Word Senses by Latent Semantic Distance
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to methods and systems for semantic disambiguation of a plurality of words. A representative method comprises providing a dataset of words associated by meaning into sets of synonyms; locating said sets at respective vertices of a graph according to semantic similarity and semantic relationship; transforming the graph into a Euclidean vector space comprising vectors indicative of respective locations of said sets; identifying a first group of said sets which include a first of said pair of words; identifying a second group of said sets which include a second of said pair of words; determining a closest pair in said vector space of said sets taken from said first and second groups of sets respectively; and outputting a meaning, of said plurality of words based on said closest pair of said sets and at least one of said semantic relationships between said closest pair of said sets.
-
Citations
67 Claims
-
1-34. -34. (canceled)
-
35. A computer implemented method of semantic disambiguation of a plurality of words, the method comprising:
-
providing a dataset of words associated by meaning into sets of synonyms; locating said sets at respective vertices of a graph, at least some pairs of said sets being spaced according to semantic similarity and categorised according to semantic relationship; transforming the graph into a Euclidean vector space comprising vectors indicative of respective locations of said sets in said vector space; identifying a first group of said sets comprising those of said sets that include a first of said pair of words; identifying a second group of said sets comprising those of said sets that include a second of said pair of words; determining a closest pair in said vector space of said sets taken from said first and second groups of sets respectively; and outputting a meaning of said plurality of words based on said closest pair of said sets and at least one of said semantic relationships between said closest pair of said sets. - View Dependent Claims (36, 37, 38, 39, 40, 47, 65)
-
-
41. The method of 40, wherein progressively locating said set as a vertex to the graph further comprises:
-
determining a hypernym of said seed word; locating said hypernym as a vertex Vh to the graph; and linking vertices Vh and Vs and assigning a weight to said link. - View Dependent Claims (42, 43, 44, 45, 46)
-
-
48. A computer implemented method of determining a latent distance between a pair of vertices of a graph, the method comprising:
-
providing a dataset comprising data points, wherein each of said data points is associated with at least one other of said data points, and a degree of association between respective pairs of said data points is represented by a weighted measure; locating said data points at respective vertices of a graph with said respective pairs of said data points spaced according said weighted measures; transforming the graph into a Euclidean vector space comprising vectors to create said vector space; and using said vector space to determine said latent distance between said pair of vertices, said latent distance being a distance between said pair of vertices in said vector space. - View Dependent Claims (49, 50, 51, 52, 53, 54)
-
-
55. A computer implemented method of forming a graph structure, the computer implemented method comprising:
-
at a server, providing a dataset comprising data points, said data points representing seed words and seed pairs, wherein each of said data points is associated with at least one other of said data points using hypernym and hyponym relations from contents of an electronic lexical database, and wherein a degree of association between respective pairs of said data points is represented by a weighted measure; and locating said data points at respective vertices of a graph with said respective pairs of said data points spaced according to said weighted measures. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62)
-
-
63. A method to enable disambiguation of word senses, the method comprising:
-
accessing an electronic lexical database; sourcing data points representing seed words and seed pairs; using the electronic lexical database and the data points to generate a graph, wherein the data points are located at respective vertices of the graph, with respective ones of pairs of data points being spaced in the graph according to a weighted measure of a degree of association between the ones of pairs of data points; generating a vector space based on the graph, wherein a distance between a pair of vertices in the vector space corresponds to a latent distance between the pair of vertices in the graph, and wherein the distance is usable for disambiguation of word senses. - View Dependent Claims (64)
-
-
66. A system to enable disambiguation of word senses, the system comprising:
-
at least one processor; and memory accessible to the at least one processor and storing program code executable to implement a vector space generator, the vector space generator having access to an electronic lexical database and receiving data points representing seed words and seed pairs, the vector space generator configured to; generate a graph by locating the data points at respective vertices of a graph, with respective ones of pairs of data points being spaced in the graph according to a weighted measure of a degree of association between the ones of pairs of data points, and generate a vector space based on the graph; wherein the vector space is usable to determine a latent distance between a pair of vertices in the graph by determining a distance between the pair of vertices in the vector space and the latent distance is usable for disambiguation of word senses. - View Dependent Claims (67)
-
Specification