Identifying salient semantic relation paths between two words
First Claim
1. A method in a computer system for identifying and evaluating the saliency of semantic relation paths between a pair of words using a dictionary that contains word entries each for a word, the word entries in turn containing segments of natural language that characterize the word of the word entry, the method comprising the steps of:
- for each of the natural language segments of the word entries, constructing a semantic relation structure comprised of semantic relations occurring between words in the natural language segment, each semantic relation having a relation type and relating two of the words in the natural language segment;
for each relation type;
determining the number of occurrences in the constructed semantic relation structures of each unique semantic relation having the relation type, called frequency,determining, for each frequency, the number of unique semantic relations having that frequency, called frequency-count, andgenerating a power curve approximating the distribution of frequency-count over frequency for the relation type;
for at least one word of the pair, collecting the semantic relation structures that relate the word of the pair to other words;
selecting, among the collected semantic relation structures, paths within the collected semantic relation structures that connect the words of the pair; and
for each selected path between the words of the pair, determining a measure of the saliency of the path by;
for the first semantic relation in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and of the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations),for subsequent semantic relations in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations beginning with the same word as the semantic relation, andmultiplying the weight determined for each semantic relation in the path to obtain a measure of the saliency of the path.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention identifies salient semantic relation paths between two words using a knowledge base. For a group of semantic relations occurring in the knowledge base, the facility models with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency. This mathematical function has a vertex frequency identifying a transition point in the mathematical function. The facility then determines the level of salience of unique semantic relations of the group such that the level of salience of unique semantic relations increases with the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function with which the relation between the frequency of occurrence of the unique semantic relations and the number of unique semantic relations occurring at that frequency is modeled. The facility is then able to determine the level of salience of a particular path between two words by combining the levels of salience determined for the semantic relations in the path.
67 Citations
31 Claims
-
1. A method in a computer system for identifying and evaluating the saliency of semantic relation paths between a pair of words using a dictionary that contains word entries each for a word, the word entries in turn containing segments of natural language that characterize the word of the word entry, the method comprising the steps of:
-
for each of the natural language segments of the word entries, constructing a semantic relation structure comprised of semantic relations occurring between words in the natural language segment, each semantic relation having a relation type and relating two of the words in the natural language segment; for each relation type; determining the number of occurrences in the constructed semantic relation structures of each unique semantic relation having the relation type, called frequency, determining, for each frequency, the number of unique semantic relations having that frequency, called frequency-count, and generating a power curve approximating the distribution of frequency-count over frequency for the relation type; for at least one word of the pair, collecting the semantic relation structures that relate the word of the pair to other words; selecting, among the collected semantic relation structures, paths within the collected semantic relation structures that connect the words of the pair; and for each selected path between the words of the pair, determining a measure of the saliency of the path by; for the first semantic relation in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and of the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations), for subsequent semantic relations in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations beginning with the same word as the semantic relation, and multiplying the weight determined for each semantic relation in the path to obtain a measure of the saliency of the path. - View Dependent Claims (2)
-
-
3. A method in a computer system for identifying and evaluating the saliency of semantic relation paths between a pair of words using, a dictionary that contains word entries each for a word, the word entries in turn containing segments of natural language that characterize the word of the word entry, the method Comprising the steps of:
-
for each of the natural language segments of the word entries, constructing a semantic relation structure comprised of semantic relations occurring between words in the natural language segment, each semantic relation having a relation type and relating two of the words in the natural language segment; for each relation type; determining the number of occurrences in the constructed semantic relation structures of each unique semantic relation having the relation type, called frequency, determining, for each frequency, the number of unique semantic relations having that frequency, called frequency-count, and generating a power curve approximating the distribution of frequency-count over frequency for the relation type; for each word of the pair, collecting the semantic relation structures that relate the word of the pair to other words; identifying an intersection word that occurs both in the semantic relation structures collected for the first word of the pair and in the semantic relation structures collected for the second word of the pair; identifying in the semantic relation structures collected for the first word of the pair each path from the first word of the pair to the intersection word; identifying in the semantic relation structures collected for the second word of the pair each path from the second word of the pair to the intersection word; for each identified path from the first word of the pair to the intersection word or from the second word of the pair to the intersection word; for the first semantic relation in the path determining a semantic relation probability by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations), for subsequent semantic relations in the path, determining a semantic relation probability by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations beginning with the same word as the semantic relation), and multiplying the probability determined for each semantic relation in the path to obtain a measure of the saliency of the path; and for each combination of an identified path from the first word of the pair to the intersection word and an identified path from the second word of the pair to the intersection word; concatenating the combination of paths to form an extended path between the words of the pair, and determining a measure of the saliency of the extended path by multiplying together the measures of the saliency of both paths of the combination, then multiplying by a join probability reflecting the likely degree to which the meaning of the intersection word is the same in the different semantic relation structures containing the paths of the combination. - View Dependent Claims (4)
-
-
5. A method in a computer system for identifying and weighting semantic relation paths between a pair of words using a corpus containing natural language incorporating the words of the pair, the method comprising the steps of:
-
identifying semantic relations occurring between words in the corpus, each semantic relation comprising a first word, a second word, and a relation type that relates the meaning of the first word to the meaning of the second word; identifying paths among the identified semantic relations that each relate the pair of words; and for each identified path; generating a weight for each semantic relation in the path signifying the proximity of the frequency with which the semantic relation occurs in the corpus to an intermediate frequency with which semantic relations having the same relation type as the semantic relations occur in the corpus, conditioning the weights generated for semantic relations in the path other than the first one on the occurrence of the first word of the semantic relation, and multiplying the weight of the first semantic relation in the path and the conditioned weights of the other semantic relations in the path to obtain a weight for the entire path.
-
-
6. A computer-readable medium whose contents cause a computer system to weight semantic relation paths between a pair of words using a corpus containing natural language incorporating the words of the pair by performing the steps of:
-
identifying semantic relations occurring between words in the corpus, each semantic relation comprising a first word, a second word, and a relation type that relates the meaning of the first word to the meaning of the second word; identifying paths among the identified semantic relations that each relate the pair of words; and for each identified path; generating a weight for each semantic relation in the path signifying the proximity of the frequency with which the semantic relation occurs in the corpus to an intermediate frequency with which semantic relations having the same relation type as the semantic relations occur in the corpus, conditioning the weights generated for semantic relations in the path other than the first one on the occurrence of the first word of the semantic relation, and multiplying the weight of the first semantic relation in the path and the conditioned weights of the other semantic relations in the path to obtain a weight for the entire path.
-
-
7. A method in computer system for determining the relevancy of semantic relations between words occurring in a knowledge base, the method comprising the steps of:
-
for a group of semantic relations occurring in the knowledge base, modeling with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency; and determining the level of relevancy of unique semantic relations of the group such that the level of relevancy of unique semantic relations increase as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function from either direction. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable medium whose contents cause a computer system to determine the saliency of semantic relations between words occurring within a knowledge base by performing the steps of:
-
for a group of semantic relations occurring in the knowledge base, modeling with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency; and determining the level of saliency of unique semantic relations of the group such that the level of saliency of unique semantic relations increase as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function from either direction. - View Dependent Claims (14, 15)
-
-
16. A method in a computer system for measuring the saliency of a semantic relation that occurs infrequently in a corpus, the semantic relation having a first part comprising a first word and a relation type and a second part comprising the relation type and a second word, the method comprising the steps of:
-
determining the frequency with which the first part of the semantic relation occurs in the corpus; determining the frequency with which the second part of the semantic relation occurs in the corpus in connection with semantic relations of the relation type; and combining the determined frequencies to obtain a measure of the saliency of the semantic relation.
-
-
17. A method in a computer system for determining the relevancy of semantic relations between words occurring within a knowledge base, the method comprising the steps of:
-
for a group of semantic relations occurring in the semantic knowledge base, modeling with a mathematical function the statistical relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency identifying a transition point in the mathematical function; and determining the level of relevancy of unique semantic relations of the group such that the level of relevancy of unique semantic relations increases as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function.
-
-
18. A computer-readable medium whose contents cause a computer system to weight the relevancy of semantic relations between words occurring within a knowledge base by performing the steps of:
-
for a group of semantic relations occurring in the semantic knowledge base, modeling with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency identifying a transition point in the mathematical function; and determining the level of relevancy of unique semantic relations of the group such that the level of relevancy of unique semantic relations increases as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function.
-
-
19. A method in a computer system for determining the relevancy of a selected one of a plurality of semantic relations occurring within a knowledge base, each semantic relation comprising a first word, a second word, and a relation type relating the meanings of the first and second words, the method comprising the steps of:
-
for unique semantic relations occurring in the knowledge base having the same relation type as the selected semantic relation, modeling with a mathematical function having a vertex each of three statistical distributions; the distribution of the number of unique semantic relations that occur in the corpus at each frequency, the distribution of the number of unique first words contained in relations of the relation type of the selected semantic relation that occur in the corpus at each frequency, and the distribution of the number of unique second words contained in relations of the relation type of the selected semantic relation that occur in the corpus at each frequency; determining a first relevancy component for the selected semantic relation based on the proximity of the frequency of the selected semantic relation to the vertex of the mathematical function modeling the distribution of the number of unique semantic relations that occur in the corpus at each frequency; determining a second relevancy component for the selected semantic relation based on the proximity of the frequency of first word and relation type of the selected semantic relation to the vertex of the mathematical function modeling the distribution of the number of unique first words contained in relations of the relation type of the selected semantic relation that occur in the corpus at each frequency, and based on the proximity of the frequency of relation type and second word of the selected semantic relation to the vertex of the mathematical function modeling the distribution of the number of unique second words contained in relations the unique semantic relations of the relation type of the selected semantic relation that occur in the corpus at each frequency; and combining the first and second relevancy components to obtain a measure of the relevancy of the selected semantic relation. - View Dependent Claims (20)
-
-
21. A method in a computer system for determining the relevance of an extended semantic relation path between a pair of words, the extended path being comprised of a plurality of subpaths each constituting a series of one or more semantic relations that, when concatenated, constitute a path between the words of the pair, the subpaths not being known to derive from the same segment of natural language, the method comprising the steps of:
-
for each subpath, determining a measure of the relevance of the subpath; for each transition in the extended path between two subpaths, determining a measure of the relevance of the transition between the subpaths; and determining a measure of the relevance of the extended path by combining the measures of relevance of the subpaths with the measure of relevance of the transition. - View Dependent Claims (22, 23)
-
-
24. A computer-readable medium whose contents cause a computer system to weight the relevance of an extended semantic relation path between a pair of words, the extended path being comprised of a plurality of subpaths each constituting a series of one or more semantic relations that, when concatenated, constitute a path between the words of the pair, the subpaths not being known to derive from the same segment of natural language, by performing the steps of:
-
for each subpath, determining a weight characterizing the relevance of the subpath; for each transition in the extended path between two subpaths, determining a weight characterizing the relevance of the transition between the subpaths; and determining a weight characterizing the relevance, of the extended path by combining the weight characterizing of relevance of the subpath with the weight characterizing relevance of the transitions. - View Dependent Claims (25, 26)
-
-
27. A method in a computer system for determining the relevancy of semantic relations between words occurring in a knowledge base, the method comprising the steps of:
-
for each unique semantic relation, determining the frequency of occurrences of that semantic relation in the knowledge base; identifying a most-salient frequency of occurrence of semantic relations based on the determined frequencies, wherein any unique semantic relation with a frequency of occurrence that is near the most-salient frequency is a salient semantic relation; and assigning a saliency weight to each unique semantic relation that decreases as the difference between the determined frequency for the unique semantic relation and the identified most-salient frequency increases. - View Dependent Claims (28, 29)
-
-
30. A computer memory containing a data structure for use in assessing the saliency of semantic relations occurring in a natural language corpus, the semantic relations each having one of a plurality of relation types, the data structure comprising:
-
for each of the plurality of relation types; for each unique semantic relation occurring in the corpus that has the relation type, an indication of the frequency at which the unique semantic relation occurs in the corpus, and information describing a power curve fitted to the relation of frequencies at which unique semantic relations of the relation type occur in the corpus and the number of relation types having each frequency of occurrence, the power curve having a vertex, such that, for a selected semantic relation of a selected relation type, the saliency of the selected semantic relation may be assessed by using the data structure to determine the distance between the vertex of the power curve for the selected relation type and the point on the power curve for the selected relation type corresponding to the frequency of occurrence of the selected semantic relation.
-
-
31. A computer memory containing a data structure for use in assessing the saliency of semantic relations occurring in a natural language corpus, the data structure comprising a plurality of semantic relation structures, each semantic relation structure relating the meaning of a head word of the semantic relation structure to a plurality of other words, each semantic relation structure having stored in conjunction with each such other word:
-
a nonextended path weight value providing a quantitative measure of the saliency of the semantic relation path occurring between the head word and the other word; and an extended path weight value providing a quantitative measure of the saliency of the semantic relation path occurring between the other word and the head word given the existence of another path to the other word, such that a quantitative measure of the saliency of the semantic relation path from a first word to a second word occurring in the natural language corpus can be determined by selecting the nonextended path weight value stored in conjunction with the second word in a semantic relation structure for which the first word is the head word, and such that a quantitative measure of the saliency of the semantic relation path from a third word to a fourth word occurring in the natural language corpus can be determined by identifying a fifth word in semantic relation structures having as their head word the third and forth words, and multiplying (the nonextended path weight value stored in conjunction with the fifth word in the semantic relation structure in which the third word is the head word) by (the extended path weight value stored in conjunction with the fifth word in the semantic relation structure in which the fourth word is the head word).
-
Specification