Semantic matching using predicate-argument structure

US 8,260,817 B2
Filed: 01/24/2011
Issued: 09/04/2012
Est. Priority Date: 10/10/2007
Status: Active Grant

First Claim

Patent Images

1. A system that processes text intervals, comprising:

a memory; and

a processor configured to execute a plurality of modules stored in the memory;

the modules including;

a preprocessing module configured to;

extract a first proposition from a first text interval;

a generation module configured to;

provide a plurality of semantic roles, wherein each role of the plurality of roles defines a different semantic relationship between at least two words,generate a first proposition tree from the first proposition, wherein the first proposition tree comprises at least one node connected to other nodes by at least one edge, wherein each node is respectively associated with at least one word from the first proposition,assign at least one of the plurality of roles to the at least one edge; and

a matching module configured to;

determine a first similarity value between the first text interval and a second text interval based on a comparison of the first proposition tree and a second proposition tree corresponding to the second text interval, wherein the second text interval is different from the first text interval and at least one of the first text interval and the second text interval comprises natural language, andselectively output the second text interval based on the first similarity value.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.

Citations

27 Claims

1. A system that processes text intervals, comprising:
- a memory; and
  
  a processor configured to execute a plurality of modules stored in the memory;
  
  the modules including;
  
  a preprocessing module configured to;
  
  extract a first proposition from a first text interval;
  
  a generation module configured to;
  
  provide a plurality of semantic roles, wherein each role of the plurality of roles defines a different semantic relationship between at least two words,generate a first proposition tree from the first proposition, wherein the first proposition tree comprises at least one node connected to other nodes by at least one edge, wherein each node is respectively associated with at least one word from the first proposition,assign at least one of the plurality of roles to the at least one edge; and
  
  a matching module configured to;
  
  determine a first similarity value between the first text interval and a second text interval based on a comparison of the first proposition tree and a second proposition tree corresponding to the second text interval, wherein the second text interval is different from the first text interval and at least one of the first text interval and the second text interval comprises natural language, andselectively output the second text interval based on the first similarity value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The system of claim 1, wherein the different semantic relationships defined by respective roles of the plurality of roles include at least one of a preposition-based relationship, a grammar-based relationship, and a conjunction-based relationship.
  - 3. The system of claim 1, wherein the first text interval is a query and the second text interval is a candidate response and the matching module is further configured to output the second text interval if the first similarity value exceeds a threshold.
  - 4. The system of claim 1, wherein the matching module is further configured to:
    - determine a second similarity value between the second text interval and the first text interval,find the second text interval redundant to the first text interval in response to the first and second similarity values each exceeding respective thresholds, andrefrain from outputting redundant text intervals.
  - 5. The system of claim 4, wherein determining the first similarity value and the second similarity value comprises performing a two-way comparison between the first and second intervals.
  - 6. The system of claim 1, further comprising:
    - an augmentation module configured to, for at least one node in the first proposition tree, associate, with the at least one node, a word having a relationship to the at least one node to form a first augmented proposition tree.
  - 7. The system of claim 6, wherein the relationship is a co-reference relationship.
  - 8. The system of claim 6, wherein associating the word having the relationship comprises:
    - identifying a real-world object, concept, or event included in the at least one node,identifying alternative words in a document in which the first text interval is included that correspond to the same real-world object, concept, or event, andaugmenting the at least one node in the first proposition tree with the identified alternative words to create the first augmented proposition tree.
  - 9. The system of claim 6, wherein the relationship is a synonym, hypernym, hyponym, or substitutable label relationship.
  - 10. The system of claim 6, wherein the matching module is configured to determine the first similarity value by determining a number of nodes and a number of edges that match between the first augmented proposition tree and the second proposition tree.
  - 11. The system of claim 10, wherein a first node in the first augmented proposition tree matches a second node in the second proposition tree in response to at least one word in, or associated with, the first node matching a word in, or associated with, the second node.
  - 12. The system of claim 11, wherein in response to the first node matching the second node as a result of a word associated with the first node matching the second node, decreasing the first similarity value based on the relationship between the associated word and the first node.
  - 13. The system of claim 10, wherein a first edge in the first augmented proposition tree matches a second edge in the second proposition tree in response to a semantic relationship associated with the first edge being substitutable for the semantic relationship associated with the second edge.
  - 14. The system of claim 13, wherein in response to the first edge matching the second edge as a result based on a substitute semantic relationship, decreasing the first similarity value based on the substitution.
  - 15. The system of claim 6, wherein the matching module is configured to determine the first similarity value by calculating a transformation score based on costs associated with transforming the first augmented proposition tree to the second proposition tree.
  - 16. The system of claim 6, wherein the second proposition tree comprises an augmented proposition tree.
  - 17. The system of claim 1, wherein the generation module is further configured to:
    - generate a first plurality of proposition subtrees and a second plurality of proposition subtrees; and
      
      the matching module is further configured to;
      
      determine a second similarity value by matching the first plurality of proposition subtrees to the second plurality of proposition subtrees.
  - 18. The system of claim 1, wherein the generation module is further configured to:
    - generate a first bag of nodes from the first proposition tree and a second bag of nodes from the second proposition tree; and
      
      the matching module is further configured to;
      
      determine a second similarity value by matching the first bag of nodes to the second bag of nodes.

19. A method of processing text intervals, comprising:
- extracting a first proposition from a first text interval;
  
  providing a plurality of semantic roles, wherein each role of the plurality of roles defines a different semantic relationship between at least two words;
  
  generating a first proposition tree from the first proposition, wherein the first proposition tree comprises at least one node connected to other nodes by at least one edge, wherein each node is respectively associated with at least one word from the first proposition;
  
  assigning at least one of the plurality of roles to the at least one edge;
  
  determining a first similarity value between the first text interval and a second text interval based on a comparison of the first proposition tree and a second proposition tree corresponding to the second text interval, wherein the second text interval is different from the first text interval and at least one of the first text interval and the second text interval comprises natural language; and
  
  selectively outputting, using a processor, the second text interval based the first similarity value.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The method of claim 19, wherein the different semantic relationships defined by respective roles of the plurality of roles include at least one of a preposition-based relationship, a grammar-based relationship, and a conjunction-based relationship.
  - 21. The method of claim 19, comprising outputting the second text interval if the first similarity value exceeds a threshold, wherein the first text interval is a query and the second text interval is a candidate response.
  - 22. The method of claim 19, comprising:
    - determining a second similarity value between the second text interval and the first text interval;
      
      finding the second text interval redundant to the first text interval in response to the first and second similarity values each exceeding respective thresholds; and
      
      refraining from outputting redundant text intervals.
  - 23. The method of claim 19, comprising associating, for at least one node in the first proposition tree, a word having a relationship to the at least one node to form a first augmented proposition tree.
  - 24. The method of claim 23, wherein the relationship is a co-reference relationship.
  - 25. The method of claim 23, wherein the relationship is a synonym, hypernym, hyponym, or substitutable label relationship.
  - 26. The method of claim 19, comprising:
    - generating a first plurality of proposition subtrees and a second plurality of proposition subtrees; and
      
      determining a second similarity value by matching the first plurality of proposition subtrees to the second plurality of proposition subtrees.
  - 27. The method of claim 19, comprising:
    - generating a first bag of nodes from the first proposition tree and a second bag of nodes from the second proposition tree; and
      
      determining a second similarity value by matching the first bag of nodes to the second bag of nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Raytheon BBN Technologies Corp. (Rtx Corporation)
Original Assignee
Raytheon BBN Technologies Corp. (Rtx Corporation)
Inventors
Boschee, Elizabeth Megan, Levit, Michael, Freedman, Marjorie Ruth
Primary Examiner(s)
HO, BINH VAN

Application Number

US13/012,225
Publication Number

US 20110153673A1
Time in Patent Office

589 Days
Field of Search

707/794
US Class Current

707/794
CPC Class Codes

G06F 16/334 Query execution G06F16/335 ...

G06F 16/35 Clustering; Classification

Semantic matching using predicate-argument structure

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Semantic matching using predicate-argument structure

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links