Determining similarity between words
First Claim
1. A method in a computer system for determining the similarity of a pair of input words, the method comprising:
- (a) selecting a multiplicity of pairs of words known to be synonyms;
(b) for a each selected pair of synonyms;
(1) identifying the most salient semantic relation paths connecting the words of the pair of synonyms, each identified semantic relation path comprising an ordered series of semantic relations, each semantic relation having a relation type; and
(2) for each identified path;
(A) extracting from the path a path pattern comprising the relation types of the relations of the path; and
(B) augmenting a path pattern frequency indicating the likelihood that an arbitrary pair of words that are connected by a path having the extracted path pattern have similar meanings;
(c) identifying the most salient semantic relation paths connecting the input words; and
(d) obtaining from the path pattern frequencies for the path patterns of the identified paths a quantitative measure of the similarity of the input words.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a facility for determining similarity between two input words utilizing the frequencies with which path patterns occurring between the words occur between words known to be synonyms. A preferred embodiment of the facility utilizes a training phase and a similarity determination phase. In the training phase, the facility first identifies, for a number of pairs of synonyms, the most salient semantic relation paths between each pair of synonyms. The facility then extracts from these semantic relation paths their path patterns, which each comprise a series of directional relation types. The number of times that each path pattern occurs between pairs of synonyms, called the frequency of the path pattern, is counted. In the training phase, the facility identifies the most salient semantic relation paths between the input words, and extracts their path patterns. The facility then averages the frequencies counted in the training phase for the path patterns extracted for the input words in order to obtain a quantitative measure of the similarity between the input words.
-
Citations
27 Claims
-
1. A method in a computer system for determining the similarity of a pair of input words, the method comprising:
-
(a) selecting a multiplicity of pairs of words known to be synonyms; (b) for a each selected pair of synonyms; (1) identifying the most salient semantic relation paths connecting the words of the pair of synonyms, each identified semantic relation path comprising an ordered series of semantic relations, each semantic relation having a relation type; and (2) for each identified path; (A) extracting from the path a path pattern comprising the relation types of the relations of the path; and (B) augmenting a path pattern frequency indicating the likelihood that an arbitrary pair of words that are connected by a path having the extracted path pattern have similar meanings; (c) identifying the most salient semantic relation paths connecting the input words; and (d) obtaining from the path pattern frequencies for the path patterns of the identified paths a quantitative measure of the similarity of the input words. - View Dependent Claims (2, 3)
-
-
4. A method in a computer system for determining the similarity of a pair of input words using a list of path pattern weights, each path pattern weight indicating quantitatively the likelihood that an arbitrary pair of words connected by a path having the path pattern have similar meanings, the method comprising:
-
identifying a plurality of semantic relation paths connecting the input words, the identified paths comprising an ordered series of semantic relations each having a relation type; combining the path pattern weights for the path patterns of the identified paths to obtain a quantitative measure of the similarity of the input words; and if the obtained quantitative measure of the similarity of the input words exceeds a minimum similarity measure, indicating that the input words are similar.
-
-
5. A computer-readable medium whose contents cause a computer system to determine the similarity of a pair of input words using a list of path patterns weights, each path pattern weight indicating quantitatively the likelihood that an arbitrary pair of words connected by a path having the path pattern have similar meanings, by:
-
identifying a plurality of semantic relation paths, the identified paths connecting the input words and comprising an ordered series of semantic relations, each semantic relation having a relation type; averaging the path pattern weights for the path patterns of the identified paths to obtain a quantitative measure of the similarity of the input words; and if the obtained quantitative measure of the similarity of the input words exceeds a minimum similarity measure, indicating that the input words are similar.
-
-
6. A method in a computer system for identifying path patterns indicating similarity between word pairs connected by these path patterns, the method comprising:
(a) for a each of a multiplicity of pairs of words known to be synonyms; (1) identifying the most salient semantic relation paths connecting the words of the pair, each identified semantic relation comprising an ordered series of semantic relations, each semantic relation having a relation type; and (2) for each identified path; (A) extracting from the path a relation type path pattern comprising the relation types of the relations of the path; and (B) augmenting an indication of the likelihood that an arbitrary pair of words connected by a path having the extracted relation type path pattern have similar meanings, such that, after the performance of the method, the likelihood indications reflect the likelihood that an arbitrary pair of words connected by a path having the relation type path pattern have similar meanings. - View Dependent Claims (7, 8)
-
9. A computer-readable medium whose contents cause a computer system to identify path patterns indicating similarity between word pairs connected by these path patterns by:
(a) for a each of a multiplicity of pairs of words known to be synonyms; (1) identifying the most salient semantic relation paths connecting the words of the pair, each identified semantic relation comprising an ordered series of semantic relations, each semantic relation having a relation type; and (2) for each identified path; (A) extracting from the path a relation type path pattern comprising the relation types of the relations of the path; and (B) incrementing a relation type path pattern frequency for the extracted relation type path pattern indicating the likelihood that an arbitrary pair of words connected by a path having the extracted relation type path pattern have similar meanings, such that, after the performance of the steps, the relation type path pattern frequencies of the relation type path patterns reflect the likelihood that an arbitrary pair of words connected by a path having the relation type path pattern have similar meanings. - View Dependent Claims (10)
-
11. A method in a computer system for determining the level of similarity of a pair of input words, the method comprising:
-
for each of a plurality of semantic relation type path patterns comprising an ordered series of semantic relation types; determining a weight for the semantic relation type path pattern characterizing the extent to which semantic relation paths having that semantic relation type path pattern and occurring between arbitrary pairs of words indicate that the arbitrary pairs of words have similar meanings; and for paths between the input words including the most salient paths between the input words, combining the weights determined for the semantic relation type path patterns corresponding to the paths between the input words in order to obtain an indication of the level of similarity of the pair of input words. - View Dependent Claims (12, 13, 14)
-
-
15. A computer system for determining the level of similarity of a pair of input words, the method comprising:
-
a semantic relation type path pattern weighting subsystem that, for each of a plurality of semantic relation type path patterns comprising an ordered series of semantic relation types, determines a weight for the semantic relation type path pattern characterizing the extent to which semantic relation paths having the semantic relation type path pattern and occurring between arbitrary pairs of words indicate that the arbitrary pairs of words have similar meanings; and a weight combination subsystem that, for paths between the input words including the most salient paths between the input words, combines the weights determined for the semantic relation type path patterns corresponding to the paths between the input words in order to obtain an indication of the level of similarity of the pair of input words. - View Dependent Claims (16)
-
-
17. A method in a computer system for determining the level of similarity of a pair of input words, the method comprising:
-
identifying a plurality of semantic relation type path patterns, each comprising an ordered series of semantic relation types, whose occurrence between arbitrary pairs of words indicate that the arbitrary pairs of words have similar meanings; and if path patterns occurring between the input words include semantic relation type path patterns among the identified plurality, indicating that the input words are similar in meaning. - View Dependent Claims (18, 19)
-
-
20. A computer-readable medium whose contents cause a computer system to determine the level of similarity of a pair of input words by:
-
identifying a plurality the semantic relation type path patterns, each comprising an ordered series of semantic relation types, whose occurrence between arbitrary pairs of words indicate that the arbitrary pairs of words have similar meanings; and if path patterns occurring between the input words include semantic relation type path patterns among the identified plurality, indicating that the input words are similar in meaning.
-
-
21. A method in a computer system for determining the strength of a selected relationship between a pair of input words, the method comprising:
-
(a) selecting a multiplicity of pairs of words between which the selected relationship is known to be strong; (b) for a each selected pair of words; (1) identifying the most salient semantic relation paths connecting the words of the selected pair, each identified semantic relation path comprising an ordered series of semantic relations, each semantic relation having a relation type; and (2) for each identified path; (A) extracting from the path a path pattern comprising the relation types of the relations of the path; and (B) augmenting a path pattern frequency indicating the likelihood that the selected relationship is strong between an arbitrary pair of words that are connected by a path having the extracted path pattern; (c) identifying the most salient semantic relation paths connecting the input words; and (d) averaging the path pattern frequencies for the path patterns of the identified paths to obtain a quantitative measure of the strength of the selected relationship between the input words. - View Dependent Claims (22, 23)
-
-
24. A computer-readable medium whose contents cause a computer system to determine the strength of a selected relationship between a pair of input words by:
-
(a) selecting a multiplicity of pairs of words between which the selected relationship is known to be strong; (b) for a each selected pair of words; (1) identifying the most salient semantic relation paths connecting the words of the selected pair, each identified semantic relation path comprising an ordered series of semantic relations, each semantic relation having a relation type; and (2) for each identified path; (A) extracting from the path a path pattern comprising the relation types of the relations of the path; and (B) augmenting a path pattern frequency indicating the likelihood that the selected relationship is strong between an arbitrary pair of words that are connected by a path having the extracted path pattern; (c) identifying the most salient semantic relation paths connecting the input words; and (d) obtaining from the path pattern frequencies for the path patterns of the identified paths a quantitative measure of the strength of the selected relationship between the input words.
-
-
25. A computer memory containing a word similarity data structure for determining the similarity of a pair of input words, the word similarity data structure comprising entries, each entry identifying:
-
a semantic relation type path pattern comprising an ordered series of semantic relation types; and a weight for the semantic relation type path pattern characterizing the extent to which a semantic relation paths of the semantic relation type path pattern occurring between an arbitrary pair of words indicates that the words of the arbitrary pair have similar meanings, the word similarity data structure being usable to determine the similarity of the input words by combining the weights for semantic relation type path patterns that occur between the input words. - View Dependent Claims (26, 27)
-
Specification