Identifying salient semantic relation paths between two words

US 6,070,134 A
Filed: 07/31/1997
Issued: 05/30/2000
Est. Priority Date: 07/31/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method in a computer system for identifying and evaluating the saliency of semantic relation paths between a pair of words using a dictionary that contains word entries each for a word, the word entries in turn containing segments of natural language that characterize the word of the word entry, the method comprising the steps of:

for each of the natural language segments of the word entries, constructing a semantic relation structure comprised of semantic relations occurring between words in the natural language segment, each semantic relation having a relation type and relating two of the words in the natural language segment;

for each relation type;

determining the number of occurrences in the constructed semantic relation structures of each unique semantic relation having the relation type, called frequency,determining, for each frequency, the number of unique semantic relations having that frequency, called frequency-count, andgenerating a power curve approximating the distribution of frequency-count over frequency for the relation type;

for at least one word of the pair, collecting the semantic relation structures that relate the word of the pair to other words;

selecting, among the collected semantic relation structures, paths within the collected semantic relation structures that connect the words of the pair; and

for each selected path between the words of the pair, determining a measure of the saliency of the path by;

for the first semantic relation in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and of the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations),for subsequent semantic relations in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations beginning with the same word as the semantic relation, andmultiplying the weight determined for each semantic relation in the path to obtain a measure of the saliency of the path.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention identifies salient semantic relation paths between two words using a knowledge base. For a group of semantic relations occurring in the knowledge base, the facility models with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency. This mathematical function has a vertex frequency identifying a transition point in the mathematical function. The facility then determines the level of salience of unique semantic relations of the group such that the level of salience of unique semantic relations increases with the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function with which the relation between the frequency of occurrence of the unique semantic relations and the number of unique semantic relations occurring at that frequency is modeled. The facility is then able to determine the level of salience of a particular path between two words by combining the levels of salience determined for the semantic relations in the path.

67 Citations

View as Search Results

31 Claims

1. A method in a computer system for identifying and evaluating the saliency of semantic relation paths between a pair of words using a dictionary that contains word entries each for a word, the word entries in turn containing segments of natural language that characterize the word of the word entry, the method comprising the steps of:
- for each of the natural language segments of the word entries, constructing a semantic relation structure comprised of semantic relations occurring between words in the natural language segment, each semantic relation having a relation type and relating two of the words in the natural language segment;
  
  for each relation type;
  
  determining the number of occurrences in the constructed semantic relation structures of each unique semantic relation having the relation type, called frequency,determining, for each frequency, the number of unique semantic relations having that frequency, called frequency-count, andgenerating a power curve approximating the distribution of frequency-count over frequency for the relation type;
  
  for at least one word of the pair, collecting the semantic relation structures that relate the word of the pair to other words;
  
  selecting, among the collected semantic relation structures, paths within the collected semantic relation structures that connect the words of the pair; and
  
  for each selected path between the words of the pair, determining a measure of the saliency of the path by;
  
  for the first semantic relation in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and of the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations),for subsequent semantic relations in the path, determining a semantic relation weight by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations beginning with the same word as the semantic relation, andmultiplying the weight determined for each semantic relation in the path to obtain a measure of the saliency of the path.
- View Dependent Claims (2)
- - 2. The method of claim 1 wherein each identified semantic relation contains a first semantic relation part, comprising the first word and relation type of the semantic relation, and a second semantic relation part, comprising the relation type and second word of the semantic relation, further including the steps of:
    - for each of the first and second semantic relation parts;
      
      for each relation type;
      
      determining the frequency of each unique semantic relation part,determining, for each frequency, the number of unique semantic relation parts having that frequency, called frequency-count, andgenerating a power curve approximating the distribution of frequency-count over frequency for the relation type;
      
      for each selected path between the words of the pair;
      
      for the first semantic relation in the path;
      
      determining a first semantic relation part weight by dividing (the smaller of the frequency of the first semantic relation part and the value of the power curve for the first semantic relation part of the relation type of the semantic relation at the frequency of the first semantic relation part) by (the total number of semantic relation parts)anddetermining a second semantic relation part weight by dividing (the smaller of the frequency of the second semantic relation part and the value of the power curve for the second semantic relation part of the relation type of the semantic relation at the frequency of the second semantic relation part) by (the sum of the frequencies of second semantic relation parts having the relation type of the semantic relation) ; and
      
      for subsequent semantic relations in the path;
      
      determining a first semantic relation part weight by dividing (the smaller of the frequency of the first semantic relation part and the value of the power curve for the first semantic relation part of the relation type of the semantic relation at the frequency of the first semantic relation part) by (the sum of the frequencies of first semantic relation parts having the first word of the semantic relation), anddetermining a second semantic relation part weight by dividing (the smaller of the frequency of the second semantic relation part and the value of the power curve for the second semantic relation part of the relation type of the semantic relation at the frequency of the second semantic relation part) by (the sum of the frequencies of second semantic relation parts having the relation type of the semantic relation); and
      
      before multiplying the weight determined for each semantic relation in the path to obtain a measure of the saliency of the path, changing the semantic relation weight for each semantic relation to a weighted average of the semantic relation weight for the semantic relation and the product of the probabilities for the semantic relation parts making up the semantic relation, the average being weighted such that the relative weight of the semantic relation weight for the semantic relation increases with the frequency of the semantic relation.

3. A method in a computer system for identifying and evaluating the saliency of semantic relation paths between a pair of words using, a dictionary that contains word entries each for a word, the word entries in turn containing segments of natural language that characterize the word of the word entry, the method Comprising the steps of:
- for each of the natural language segments of the word entries, constructing a semantic relation structure comprised of semantic relations occurring between words in the natural language segment, each semantic relation having a relation type and relating two of the words in the natural language segment;
  
  for each relation type;
  
  determining the number of occurrences in the constructed semantic relation structures of each unique semantic relation having the relation type, called frequency,determining, for each frequency, the number of unique semantic relations having that frequency, called frequency-count, andgenerating a power curve approximating the distribution of frequency-count over frequency for the relation type;
  
  for each word of the pair, collecting the semantic relation structures that relate the word of the pair to other words;
  
  identifying an intersection word that occurs both in the semantic relation structures collected for the first word of the pair and in the semantic relation structures collected for the second word of the pair;
  
  identifying in the semantic relation structures collected for the first word of the pair each path from the first word of the pair to the intersection word;
  
  identifying in the semantic relation structures collected for the second word of the pair each path from the second word of the pair to the intersection word;
  
  for each identified path from the first word of the pair to the intersection word or from the second word of the pair to the intersection word;
  
  for the first semantic relation in the path determining a semantic relation probability by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations),for subsequent semantic relations in the path, determining a semantic relation probability by dividing (the smaller of the frequency of the semantic relation and the value of the power curve for the relation type of the semantic relation at the frequency of the semantic relation) by (the total number of constructed semantic relations beginning with the same word as the semantic relation), andmultiplying the probability determined for each semantic relation in the path to obtain a measure of the saliency of the path; and
  
  for each combination of an identified path from the first word of the pair to the intersection word and an identified path from the second word of the pair to the intersection word;
  
  concatenating the combination of paths to form an extended path between the words of the pair, anddetermining a measure of the saliency of the extended path by multiplying together the measures of the saliency of both paths of the combination, then multiplying by a join probability reflecting the likely degree to which the meaning of the intersection word is the same in the different semantic relation structures containing the paths of the combination.
- View Dependent Claims (4)
- - 4. The method of claim 3, further comprising the steps of:
    - determining a level of similarity between the meanings of the intersection word in the different semantic relation structures containing the paths of the combination; and
      
      determining the join probability based on the determined level of similarity.

5. A method in a computer system for identifying and weighting semantic relation paths between a pair of words using a corpus containing natural language incorporating the words of the pair, the method comprising the steps of:
- identifying semantic relations occurring between words in the corpus, each semantic relation comprising a first word, a second word, and a relation type that relates the meaning of the first word to the meaning of the second word;
  
  identifying paths among the identified semantic relations that each relate the pair of words; and
  
  for each identified path;
  
  generating a weight for each semantic relation in the path signifying the proximity of the frequency with which the semantic relation occurs in the corpus to an intermediate frequency with which semantic relations having the same relation type as the semantic relations occur in the corpus,conditioning the weights generated for semantic relations in the path other than the first one on the occurrence of the first word of the semantic relation, andmultiplying the weight of the first semantic relation in the path and the conditioned weights of the other semantic relations in the path to obtain a weight for the entire path.

6. A computer-readable medium whose contents cause a computer system to weight semantic relation paths between a pair of words using a corpus containing natural language incorporating the words of the pair by performing the steps of:
- identifying semantic relations occurring between words in the corpus, each semantic relation comprising a first word, a second word, and a relation type that relates the meaning of the first word to the meaning of the second word;
  
  identifying paths among the identified semantic relations that each relate the pair of words; and
  
  for each identified path;
  
  generating a weight for each semantic relation in the path signifying the proximity of the frequency with which the semantic relation occurs in the corpus to an intermediate frequency with which semantic relations having the same relation type as the semantic relations occur in the corpus,conditioning the weights generated for semantic relations in the path other than the first one on the occurrence of the first word of the semantic relation, andmultiplying the weight of the first semantic relation in the path and the conditioned weights of the other semantic relations in the path to obtain a weight for the entire path.

7. A method in computer system for determining the relevancy of semantic relations between words occurring in a knowledge base, the method comprising the steps of:
- for a group of semantic relations occurring in the knowledge base, modeling with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency; and
  
  determining the level of relevancy of unique semantic relations of the group such that the level of relevancy of unique semantic relations increase as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function from either direction.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7 wherein each semantic relation in the knowledge base has one of a plurality of relation types, the method further including the steps of:
    - selecting one of the plurality of relation types; and
      
      selecting as the modeled group of semantic relations those semantic relations in the knowledge base having the selected relation type.
  - 9. The method of claim 7 wherein the modeling step models the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency as a power curve.
  - 10. The method of claim 7 wherein the vertex frequency of the mathematical function is the frequency at which the value of the mathematical function is equal to the frequency.
  - 11. The method of claim 7 wherein the vertex frequency of the mathematical function is the frequency at which the value of the first derivative of the mathematical function is equal to -1.
  - 12. The method of claim 7 wherein the vertex frequency of the mathematical function is the frequency at which a graph of the mathematical function is nearest to the origin.

13. A computer-readable medium whose contents cause a computer system to determine the saliency of semantic relations between words occurring within a knowledge base by performing the steps of:
- for a group of semantic relations occurring in the knowledge base, modeling with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency; and
  
  determining the level of saliency of unique semantic relations of the group such that the level of saliency of unique semantic relations increase as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function from either direction.
- View Dependent Claims (14, 15)
- - 14. The computer-readable medium of claim 13 where each semantic relation in the knowledge base has one of a plurality of relation types, the contents of the computer-readable medium further causing the computer system to perform the steps of:
    - selecting one of the plurality of relation types; and
      
      selecting as the modeled group of semantic relations those semantic relations in the knowledge base having the selected relation type.
  - 15. The computer-readable medium of claim 13 wherein the modeling step models the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency as a power curve.

16. A method in a computer system for measuring the saliency of a semantic relation that occurs infrequently in a corpus, the semantic relation having a first part comprising a first word and a relation type and a second part comprising the relation type and a second word, the method comprising the steps of:
- determining the frequency with which the first part of the semantic relation occurs in the corpus;
  
  determining the frequency with which the second part of the semantic relation occurs in the corpus in connection with semantic relations of the relation type; and
  
  combining the determined frequencies to obtain a measure of the saliency of the semantic relation.

17. A method in a computer system for determining the relevancy of semantic relations between words occurring within a knowledge base, the method comprising the steps of:
- for a group of semantic relations occurring in the semantic knowledge base, modeling with a mathematical function the statistical relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency identifying a transition point in the mathematical function; and
  
  determining the level of relevancy of unique semantic relations of the group such that the level of relevancy of unique semantic relations increases as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function.

18. A computer-readable medium whose contents cause a computer system to weight the relevancy of semantic relations between words occurring within a knowledge base by performing the steps of:
- for a group of semantic relations occurring in the semantic knowledge base, modeling with a mathematical function the relation between a frequency of occurrence of unique semantic relations and the number of unique semantic relations that occur at that frequency, the mathematical function having a vertex frequency identifying a transition point in the mathematical function; and
  
  determining the level of relevancy of unique semantic relations of the group such that the level of relevancy of unique semantic relations increases as the frequency of occurrence of the unique semantic relations approaches the vertex frequency of the mathematical function.

19. A method in a computer system for determining the relevancy of a selected one of a plurality of semantic relations occurring within a knowledge base, each semantic relation comprising a first word, a second word, and a relation type relating the meanings of the first and second words, the method comprising the steps of:
- for unique semantic relations occurring in the knowledge base having the same relation type as the selected semantic relation, modeling with a mathematical function having a vertex each of three statistical distributions;
  
  the distribution of the number of unique semantic relations that occur in the corpus at each frequency,the distribution of the number of unique first words contained in relations of the relation type of the selected semantic relation that occur in the corpus at each frequency, andthe distribution of the number of unique second words contained in relations of the relation type of the selected semantic relation that occur in the corpus at each frequency;
  
  determining a first relevancy component for the selected semantic relation based on the proximity of the frequency of the selected semantic relation to the vertex of the mathematical function modeling the distribution of the number of unique semantic relations that occur in the corpus at each frequency;
  
  determining a second relevancy component for the selected semantic relation based on the proximity of the frequency of first word and relation type of the selected semantic relation to the vertex of the mathematical function modeling the distribution of the number of unique first words contained in relations of the relation type of the selected semantic relation that occur in the corpus at each frequency, and based on the proximity of the frequency of relation type and second word of the selected semantic relation to the vertex of the mathematical function modeling the distribution of the number of unique second words contained in relations the unique semantic relations of the relation type of the selected semantic relation that occur in the corpus at each frequency; and
  
  combining the first and second relevancy components to obtain a measure of the relevancy of the selected semantic relation.
- View Dependent Claims (20)
- - 20. The method of claim 19 wherein the combining step computes a weighted average of the first and second relevancy components, the average being weighted such that the relative weight of the first relevancy component varies directly with the frequency of the selected semantic relation.

21. A method in a computer system for determining the relevance of an extended semantic relation path between a pair of words, the extended path being comprised of a plurality of subpaths each constituting a series of one or more semantic relations that, when concatenated, constitute a path between the words of the pair, the subpaths not being known to derive from the same segment of natural language, the method comprising the steps of:
- for each subpath, determining a measure of the relevance of the subpath;
  
  for each transition in the extended path between two subpaths, determining a measure of the relevance of the transition between the subpaths; and
  
  determining a measure of the relevance of the extended path by combining the measures of relevance of the subpaths with the measure of relevance of the transition.
- View Dependent Claims (22, 23)
- - 22. The method of claim 21 wherein each transition between subpaths occurs at an intersection word, and wherein the step of determining a measure of the relevance of the transition between the subpaths includes the step of inverting a frequency characterizing the commonness of occurrences of the intersection word at which the transition occurs.
  - 23. The method of claim 21 wherein each transition between subpaths occurs at an intersection word, and wherein the step of determining a measure of the relevance of the transition between the subpaths determines the relevance measure based upon the level of similarity between the meanings of the occurrences of the intersection word in each of the subpaths concatenated at the transition.

24. A computer-readable medium whose contents cause a computer system to weight the relevance of an extended semantic relation path between a pair of words, the extended path being comprised of a plurality of subpaths each constituting a series of one or more semantic relations that, when concatenated, constitute a path between the words of the pair, the subpaths not being known to derive from the same segment of natural language, by performing the steps of:
- for each subpath, determining a weight characterizing the relevance of the subpath;
  
  for each transition in the extended path between two subpaths, determining a weight characterizing the relevance of the transition between the subpaths; and
  
  determining a weight characterizing the relevance, of the extended path by combining the weight characterizing of relevance of the subpath with the weight characterizing relevance of the transitions.
- View Dependent Claims (25, 26)
- - 25. The computer-readable medium of claim 24 wherein each transition between subpaths occurs at an intersection word, and wherein the step of determining a measure of the relevance of the transition between the subpaths determines a measure that is inversely related to a frequency characterizing the commonness of occurrence of the intersection word at which the transition occurs.
  - 26. The computer-readable medium of claim 24 wherein each transition between subpaths occurs at an intersection word, and wherein the step of determining a measure of the relevance of the transition between the subpaths determines the relevance measure based upon the level of similarity between the meanings of the occurrences of the intersection word in each of the subpaths concatenated at the transition.

27. A method in a computer system for determining the relevancy of semantic relations between words occurring in a knowledge base, the method comprising the steps of:
- for each unique semantic relation, determining the frequency of occurrences of that semantic relation in the knowledge base;
  
  identifying a most-salient frequency of occurrence of semantic relations based on the determined frequencies, wherein any unique semantic relation with a frequency of occurrence that is near the most-salient frequency is a salient semantic relation; and
  
  assigning a saliency weight to each unique semantic relation that decreases as the difference between the determined frequency for the unique semantic relation and the identified most-salient frequency increases.
- View Dependent Claims (28, 29)
- - 28. The method of claim 27, further comprising the step of compiling the knowledge base by deriving the semantic relations between words occurring in the knowledge base from the text of a natural language corpus.
  - 29. The method of claim 27, further comprising the step of compiling the knowledge base by deriving the semantic relations between words occurring in the knowledge base from the text of a dictionary.

30. A computer memory containing a data structure for use in assessing the saliency of semantic relations occurring in a natural language corpus, the semantic relations each having one of a plurality of relation types, the data structure comprising:
- for each of the plurality of relation types;
  
  for each unique semantic relation occurring in the corpus that has the relation type, an indication of the frequency at which the unique semantic relation occurs in the corpus, andinformation describing a power curve fitted to the relation of frequencies at which unique semantic relations of the relation type occur in the corpus and the number of relation types having each frequency of occurrence, the power curve having a vertex, such that, for a selected semantic relation of a selected relation type, the saliency of the selected semantic relation may be assessed by using the data structure to determine the distance between the vertex of the power curve for the selected relation type and the point on the power curve for the selected relation type corresponding to the frequency of occurrence of the selected semantic relation.

31. A computer memory containing a data structure for use in assessing the saliency of semantic relations occurring in a natural language corpus, the data structure comprising a plurality of semantic relation structures, each semantic relation structure relating the meaning of a head word of the semantic relation structure to a plurality of other words, each semantic relation structure having stored in conjunction with each such other word:
- a nonextended path weight value providing a quantitative measure of the saliency of the semantic relation path occurring between the head word and the other word; and
  
  an extended path weight value providing a quantitative measure of the saliency of the semantic relation path occurring between the other word and the head word given the existence of another path to the other word, such that a quantitative measure of the saliency of the semantic relation path from a first word to a second word occurring in the natural language corpus can be determined by selecting the nonextended path weight value stored in conjunction with the second word in a semantic relation structure for which the first word is the head word, and such that a quantitative measure of the saliency of the semantic relation path from a third word to a fourth word occurring in the natural language corpus can be determined by identifying a fifth word in semantic relation structures having as their head word the third and forth words, and multiplying (the nonextended path weight value stored in conjunction with the fifth word in the semantic relation structure in which the third word is the head word) by (the extended path weight value stored in conjunction with the fifth word in the semantic relation structure in which the fourth word is the head word).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Dolan, William B., Richardson, Stephen D.
Primary Examiner(s)
Chang, Vivian
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/904,418
Time in Patent Office

1,034 Days
Field of Search

704/1, 704/9, 704/10, 707/530, 707/531, 707/532
US Class Current

704/9
CPC Class Codes

G06F 16/367 Ontology

Identifying salient semantic relation paths between two words

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

67 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying salient semantic relation paths between two words

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links