Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications

US 8,527,262 B2
Filed: 06/22/2007
Issued: 09/03/2013
Est. Priority Date: 06/22/2007
Status: Active Grant

First Claim

Patent Images

1. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:

receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes;

automatically parsing the inflicted words into their constituent morphemes;

grouping the parsed morphemes of the inflicted words with the same syntactic role into constituents;

identifying a plurality of verb-constituent pairs in the text sentence;

predicting potential arguments for each constituent of the grouped morphemes, wherein the constituents are associated with a verb by the verb-constituent pairs and each prediction is weighted for a respective argument and grouped morpheme being considered;

assigning a probability to each of the potential arguments, wherein the probability indicates a probability that the potential argument applies to a respective constituent; and

outputting a plurality of semantic roles for a given verb/constituent pair as the potential arguments with corresponding probabilities,wherein predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments includes;

performing lexical/surface analysis;

performing morphological analysis;

performing semantic analysis;

performing syntactic analysis; and

integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for automated semantic role labeling for languages having complex morphology. In one aspect, a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes, identifying a target verb as a stem of an inflicted word in the text sentence, grouping morphemes from one or more inflicted words with the same syntactic role into constituents, and predicting a semantic role of a constituent for the target verb.

78 Citations

View as Search Results

17 Claims

1. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
- receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes;
  
  automatically parsing the inflicted words into their constituent morphemes;
  
  grouping the parsed morphemes of the inflicted words with the same syntactic role into constituents;
  
  identifying a plurality of verb-constituent pairs in the text sentence;
  
  predicting potential arguments for each constituent of the grouped morphemes, wherein the constituents are associated with a verb by the verb-constituent pairs and each prediction is weighted for a respective argument and grouped morpheme being considered;
  
  assigning a probability to each of the potential arguments, wherein the probability indicates a probability that the potential argument applies to a respective constituent; and
  
  outputting a plurality of semantic roles for a given verb/constituent pair as the potential arguments with corresponding probabilities,wherein predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments includes;
  
  performing lexical/surface analysis;
  
  performing morphological analysis;
  
  performing semantic analysis;
  
  performing syntactic analysis; and
  
  integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The program storage device of claim 1, wherein automatically parsing the inflicted words using the trained classifiers comprising performing an automated morphological analysis using the one or more trained classifiers to segment the text sentence into a sequence of morphemes by separating stems and affixes of inflicted words.
  - 3. The program storage device of claim 1, wherein predicting potential arguments is performed automatically using a semantic role labeling model that predicts semantic roles using a plurality of syntactic and lexical features extracted from the input text sentence.
  - 4. The program storage device of claim 1, wherein identifying a target verb and grouping morphemes is performed using an automated syntactic parsing process to build a parse tree where each node in the parse tree is a constituent.
  - 5. The program storage device of claim 4, wherein the automated syntactic parsing process is implemented using a parsing model trained on an annotated corpus of verb-argument structures.
  - 6. The program storage device of claim 1, wherein the input text sentence is Arabic text.

7. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing natural language text, the method steps comprising:
- receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes;
  
  automatically parsing the inflicted words into their constituent morphemes;
  
  grouping the parsed morphemes of the inflicted words with the same syntactic role into constituents;
  
  identifying a plurality of verb-constituent pairs in the text sentence, wherein at least one verb-constituent pair is formed of respective morphemes of a same white-space delimited word;
  
  predicting potential arguments for each constituent of the grouped morphemes, wherein the constituents are associated with a verb by the verb-constituent pairs; and
  
  predicting a semantic role of each constituent of the grouped morphemes according to probabilities of the potential arguments,wherein predicting potential arguments for each constituent of the grouped morphemes and predicting the semantic role of each constituent of the grouped morphemes according to the probabilities of each of the potential arguments includes;
  
  performing lexical/surface analysis;
  
  performing morphological analysis;
  
  performing semantic analysis;
  
  performing syntactic analysis; and
  
  integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting potential arguments for each constituent of the grouped morphemes and predicting the semantic role of each constituent of the grouped morphemes according to the probabilities of each of the potential arguments.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. The program storage device of claim 7, further comprising instructions for performing an automated morphological analysis using the one or more trained classifiers to segment the text sentence into a sequence of morphemes by separating sterns and affixes of inflicted words.
  - 9. The program storage device of claim 7, wherein the instructions for predicting the semantic role of each constituent includes instructions for automatically predicting semantic roles using a semantic role labeling model using a plurality of syntactic and lexical features extracted from the input text sentence.
  - 10. The program storage device of claim 7, wherein the instructions for identifying each verb-constituent pair and grouping the arguments comprise instructions for using an automated syntactic parsing process to build a parse tree where each node in the parse tree is a constituent.
  - 11. The program storage device of claim 10, wherein the instructions for automated syntactic parsing process comprise instructions for using a parsing model trained on an annotated corpus of verb-argument structures.
  - 12. The program storage device of claim 7, wherein the instructions for predicting potential arguments for each constituent, comprise instructions for:
    - identifying a constituent that is an argument of the target verb; and
      
      assigning a semantic role to a constituent that is identified as an argument of the target verb.
  - 13. The program storage device of claim 7, wherein the instructions for predicting potential arguments for each constituent, comprise instructions for:
    - determining a likelihood of a semantic role of a constituent for the target verb over a set of possible semantic roles given the identified target verb; and
      
      assigning the constituent a semantic role label having the highest likelihood over the set of possible semantic roles.
  - 14. The program storage device of claim 7, wherein the instructions for predicting potential arguments for each constituent, comprise instructions for using a statistical model trained to process a plurality of lexical and syntactic features.
  - 15. The program storage device of claim 7, wherein the input text sentence is Arabic text.

16. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
- receiving as input a natural language text sentence comprising a sequence of white-space delimited words including at least one inflicted word comprising a stem and one or more affixes;
  
  automatically segmenting the white-space delimited words into separate morphemes by parsing the inflicted words into their constituent morphemes;
  
  automatically grouping the parsed morphemes into constituents and identifying morphemes that are target verbs; and
  
  automatically predicting a semantic role of each constituent for each target verb using a trained statistical model, wherein each prediction is associated with a probability; and
  
  selecting one of the constituents as an argument for each target verb according to the probabilities,wherein predicting the semantic role of each constituent and selecting one of the constituents includes;
  
  performing lexical/surface analysis;
  
  performing morphological analysis;
  
  performing semantic analysis;
  
  performing syntactic analysis; and
  
  integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting the semantic role of each constituent and selecting one of the constituents.

17. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
- receiving as input a natural language text sentence comprising a sequence of white-space delimited words including at least one inflicted word comprising a stem and one or more affixes;
  
  automatically parsing the inflicted words into their constituent morphemes;
  
  automatically detecting stems of inflicted words that are target verbs and grouping stems and affixes of different words into constituents, using morphological information derived from the automatic parsing;
  
  automatically predicting a semantic role of each constituent for a target verb using a trained statistical model using a plurality of feature data including morphological features extracted during morphological analysis, wherein each prediction is associated with a probability; and
  
  selecting one of the constituents as an argument for each target verb according to the probabilities,wherein predicting the semantic roll of each constituent and selecting one of the includes;
  
  performing lexical/surface analysis;
  
  performing morphological analysis;
  
  performing semantic analysis;
  
  performing syntactic analysis; and
  
  integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting the semantic role of each constituent and selecting one of the constituents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Kambhatla, Nandakishore, Zitouni, Imed
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US11/767,104
Publication Number

US 20080319735A1
Time in Patent Office

2,265 Days
Field of Search

704/9, 704/260, 704/6, 704/5, 704/270, 704/257, 704/2, 704/10, 704/1, 434/322, 715/255, 706/50, 706/45, 706/12
US Class Current

704/9
CPC Class Codes

G06F 40/284 Lexical analysis, e.g. toke...

G06F 40/30 Semantic analysis

Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

78 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

78 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links