Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications
First Claim
1. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
- receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes;
automatically parsing the inflicted words into their constituent morphemes;
grouping the parsed morphemes of the inflicted words with the same syntactic role into constituents;
identifying a plurality of verb-constituent pairs in the text sentence;
predicting potential arguments for each constituent of the grouped morphemes, wherein the constituents are associated with a verb by the verb-constituent pairs and each prediction is weighted for a respective argument and grouped morpheme being considered;
assigning a probability to each of the potential arguments, wherein the probability indicates a probability that the potential argument applies to a respective constituent; and
outputting a plurality of semantic roles for a given verb/constituent pair as the potential arguments with corresponding probabilities,wherein predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments includes;
performing lexical/surface analysis;
performing morphological analysis;
performing semantic analysis;
performing syntactic analysis; and
integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for automated semantic role labeling for languages having complex morphology. In one aspect, a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes, identifying a target verb as a stem of an inflicted word in the text sentence, grouping morphemes from one or more inflicted words with the same syntactic role into constituents, and predicting a semantic role of a constituent for the target verb.
78 Citations
17 Claims
-
1. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
-
receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes; automatically parsing the inflicted words into their constituent morphemes; grouping the parsed morphemes of the inflicted words with the same syntactic role into constituents; identifying a plurality of verb-constituent pairs in the text sentence; predicting potential arguments for each constituent of the grouped morphemes, wherein the constituents are associated with a verb by the verb-constituent pairs and each prediction is weighted for a respective argument and grouped morpheme being considered; assigning a probability to each of the potential arguments, wherein the probability indicates a probability that the potential argument applies to a respective constituent; and outputting a plurality of semantic roles for a given verb/constituent pair as the potential arguments with corresponding probabilities, wherein predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments includes; performing lexical/surface analysis; performing morphological analysis; performing semantic analysis; performing syntactic analysis; and integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting potential arguments for each constituent of the grouped morphemes and assigning the probability to each of the potential arguments. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing natural language text, the method steps comprising:
-
receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes; automatically parsing the inflicted words into their constituent morphemes; grouping the parsed morphemes of the inflicted words with the same syntactic role into constituents; identifying a plurality of verb-constituent pairs in the text sentence, wherein at least one verb-constituent pair is formed of respective morphemes of a same white-space delimited word; predicting potential arguments for each constituent of the grouped morphemes, wherein the constituents are associated with a verb by the verb-constituent pairs; and predicting a semantic role of each constituent of the grouped morphemes according to probabilities of the potential arguments, wherein predicting potential arguments for each constituent of the grouped morphemes and predicting the semantic role of each constituent of the grouped morphemes according to the probabilities of each of the potential arguments includes; performing lexical/surface analysis; performing morphological analysis; performing semantic analysis; performing syntactic analysis; and integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting potential arguments for each constituent of the grouped morphemes and predicting the semantic role of each constituent of the grouped morphemes according to the probabilities of each of the potential arguments. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
-
receiving as input a natural language text sentence comprising a sequence of white-space delimited words including at least one inflicted word comprising a stem and one or more affixes; automatically segmenting the white-space delimited words into separate morphemes by parsing the inflicted words into their constituent morphemes; automatically grouping the parsed morphemes into constituents and identifying morphemes that are target verbs; and automatically predicting a semantic role of each constituent for each target verb using a trained statistical model, wherein each prediction is associated with a probability; and selecting one of the constituents as an argument for each target verb according to the probabilities, wherein predicting the semantic role of each constituent and selecting one of the constituents includes; performing lexical/surface analysis; performing morphological analysis; performing semantic analysis; performing syntactic analysis; and integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting the semantic role of each constituent and selecting one of the constituents.
-
-
17. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform a method for processing natural language text, comprising:
-
receiving as input a natural language text sentence comprising a sequence of white-space delimited words including at least one inflicted word comprising a stem and one or more affixes; automatically parsing the inflicted words into their constituent morphemes; automatically detecting stems of inflicted words that are target verbs and grouping stems and affixes of different words into constituents, using morphological information derived from the automatic parsing; automatically predicting a semantic role of each constituent for a target verb using a trained statistical model using a plurality of feature data including morphological features extracted during morphological analysis, wherein each prediction is associated with a probability; and selecting one of the constituents as an argument for each target verb according to the probabilities, wherein predicting the semantic roll of each constituent and selecting one of the includes; performing lexical/surface analysis; performing morphological analysis; performing semantic analysis; performing syntactic analysis; and integrating results of the lexical/surface analysis, the morphological analysis, the semantic analysis, and the syntactic analysis into a statistical model based on Maximum Entropy to produce a probability model for predicting the semantic role of each constituent and selecting one of the constituents.
-
Specification