Acquisition and application of contextual role knowledge for coreference resolution
First Claim
1. A method for associating anaphors with antecedents in a written work, the method comprising:
- processing a training corpus containing textual documents that are topically related to the written work, said processing producing interpretive information useful to categorize noun phrases of the training corpus as independent or potentially anaphoric;
identifying noun phrases within the written work;
using the interpretive information, filtering those identified noun phrases to exclude noun phrases that can be identified to be independent in nature;
identifying a set of potentially anaphoric noun phrases occurring in the written work;
following said identifying a set of potentially anaphoric noun phrases, recognizing cases of unambiguous coreferences in the set of potentially anaphoric noun phrases, said recognizing associating a noun phrase with an antecedent for each case;
following said recognizing, identifying coreference combinations for unrecognized noun phrases from the set of potentially anaphoric noun phrases, each coreference combination including an unrecognized noun phrase and a potential antecedent;
applying a plurality of general knowledge sources and contextual role knowledge sources to the coreference combinations, wherein the contextual role knowledge sources include events and a manner of participation in the events to identify relatedness for the coreference combination at a thematic role level, the contextual role knowledge sources including lexical expectations and semantic expectations to resolve the coreference combination by comparing the lexical expectations and the semantic expectations of the coreference combination to the antecedent, said applying producing evidentiary values for the coreference combinations;
applying a factor to each of the produced evidentiary values to favor more credible knowledge sources;
for each unrecognized noun phrase, applying a probabilistic model to the produced evidentiary values associated with the noun phrase; and
for each application of the probabilistic model to an unrecognized noun phrase, selecting either an antecedent for that unrecognized noun phrase, if the coreference of that antecedent has a corresponding evidentiary value above a selected threshold value, or no antecedent otherwise.
5 Assignments
0 Petitions
Accused Products
Abstract
Coreference resolution is the process of identifying when two noun phrases (NP) refer to the same entity. Two main contributions to computational coreference resolution are made. First, this work contributes a new method for recognizing when an NP is anaphoric. Second, traditional approaches to coreference resolution typically select the most appropriate antecedent by recognizing word similarity, proximity, and agreement in number, gender, and semantic class. This work contributes a new source of evidence that focuses on the roles that an anaphor and antecedent play in particular events or relationships. I show that using contextual role knowledge as part of the coreference resolution process increases the number of anaphors that can be resolved, and I demonstrate an unsupervised method for acquiring contextual role knowledge that does not require an annotated training corpus. A probabilistic model based on the Dempster-Shafer model of evidence is used to incorporate contextual role knowledge with traditional evidence sources.
-
Citations
19 Claims
-
1. A method for associating anaphors with antecedents in a written work, the method comprising:
-
processing a training corpus containing textual documents that are topically related to the written work, said processing producing interpretive information useful to categorize noun phrases of the training corpus as independent or potentially anaphoric; identifying noun phrases within the written work; using the interpretive information, filtering those identified noun phrases to exclude noun phrases that can be identified to be independent in nature; identifying a set of potentially anaphoric noun phrases occurring in the written work; following said identifying a set of potentially anaphoric noun phrases, recognizing cases of unambiguous coreferences in the set of potentially anaphoric noun phrases, said recognizing associating a noun phrase with an antecedent for each case; following said recognizing, identifying coreference combinations for unrecognized noun phrases from the set of potentially anaphoric noun phrases, each coreference combination including an unrecognized noun phrase and a potential antecedent; applying a plurality of general knowledge sources and contextual role knowledge sources to the coreference combinations, wherein the contextual role knowledge sources include events and a manner of participation in the events to identify relatedness for the coreference combination at a thematic role level, the contextual role knowledge sources including lexical expectations and semantic expectations to resolve the coreference combination by comparing the lexical expectations and the semantic expectations of the coreference combination to the antecedent, said applying producing evidentiary values for the coreference combinations; applying a factor to each of the produced evidentiary values to favor more credible knowledge sources; for each unrecognized noun phrase, applying a probabilistic model to the produced evidentiary values associated with the noun phrase; and for each application of the probabilistic model to an unrecognized noun phrase, selecting either an antecedent for that unrecognized noun phrase, if the coreference of that antecedent has a corresponding evidentiary value above a selected threshold value, or no antecedent otherwise. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for associating anaphors with antecedents in a written work, comprising the steps of:
-
processing a training corpus containing textual documents that are topically related to the written work, said processing producing interpretive information useful to categorize noun phrases of the training corpus as independent or potentially anaphoric, said processing producing a definite-only list of definite noun phrases, said processing optionally producing 81 extractions of noun phrases from the first sentences of texts comprising the training corpus, said processing further optionally producing existential head patterns, wherein said interpretive information includes the definite-only list of definite noun phrases and optionally the produced 81 extractions or the existential head patterns; identifying noun phrases within the written work; using the interpretive information, filtering those identified noun phrases to exclude noun phrases that can be identified to be independent in nature, said filtering using the produced definite-only list to exclude definite noun phrases; identifying a set of potentially anaphoric noun phrases occurring in the written work; following said identifying a set of potentially anaphoric noun phrases, recognizing cases of unambiguous coreferences in the set of potentially anaphoric noun phrases, said recognizing associating a noun phrase with an antecedent for each case; following said recognizing, identifying coreference combinations for unrecognized noun phrases from the set of potentially anaphoric noun phrases, each coreference combination including an unrecognized noun phrase and a potential antecedent; applying a plurality of general knowledge sources and contextual role knowledge sources to the coreference combinations, said applying producing evidentiary values for the coreference combinations, wherein the contextual role knowledge sources include events and a manner of participation in the events that identifies relatedness for the coreference combinations at a thematic role level and wherein the events and manner of participation in the events of the contextual role knowledge sources include lexical expectations and semantic expectations to resolve the coreference combination by comparing the lexical expectations and the semantic expectations of the coreference combination to the antecedent; applying a factor to each of the produced evidentiary values to favor more credible knowledge sources; for each unrecognized noun phrase, applying a Dempster-Shafer model to the produced evidentiary values associated with the noun phrase; and for each application of the model to an unrecognized noun phrase, selecting either an antecedent for that unrecognized noun phrase, if the coreference of that antecedent has a corresponding evidentiary value above a selected threshold value, or no antecedent otherwise. - View Dependent Claims (11, 12, 13)
-
-
14. A method for associating anaphors with antecedents in a written work, comprising the steps of:
-
processing a training corpus containing textual documents that are topically related to the written work, said processing producing interpretive information useful to categorize noun phrases of the training corpus as independent or potentially anaphoric; processing the training corpus to extract a set of thematic caseframes; identifying noun phrases within the written work; using the interpretive information, filtering those identified noun phrases to exclude noun phrases that can be identified to be independent in nature; identifying a set of potentially anaphoric noun phrases occurring in the written work; following said identifying a set of potentially anaphoric noun phrases, recognizing cases of unambiguous coreferences in the set of potentially anaphoric noun phrases, said recognizing associating a noun phrase with an antecedent for each case; following said recognizing, identifying coreference combinations for unrecognized noun phrases from the set of potentially anaphoric noun phrases, each coreference combination including an unrecognized noun phrase and a potential antecedent; applying a plurality of general knowledge sources and a contextual role knowledge source to the coreference combinations, said applying producing evidentiary values for the coreference combinations, wherein the contextual knowledge sources apply extracted thematic caseframes that identify relatedness between the noun phrase and potential antecedent in each coreference combination at a thematic role level and wherein the contextual role knowledge sources include lexical expectations and semantic expectations to resolve the coreference combination by comparing the lexical expectations and the semantic expectations of the coreference combination to the antecedent; applying a factor to each of the produced evidentiary values to favor more credible knowledge sources; for each unrecognized noun phrase, applying a probabilistic model to the produced evidentiary values associated with the noun phrase; and for each application of the probabilistic model to an unrecognized noun phrase, selecting either an antecedent for that unrecognized noun phrase, if the coreference of that antecedent has a corresponding evidentiary value above a selected threshold value, or no antecedent otherwise. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A set of computer readable media containing computer instructions for operating a anaphor-antecedent associator, the set of computer readable media comprising at least one medium upon which is stored the computer instructions executable by a computing system to achieve the functions of:
-
(i) processing a training corpus containing textual documents that are topically related to the written work, said processing producing interpretive information useful to categorize noun phrases of the training corpus as independent or potentially anaphoric; (ii) identifying noun phrases within the written work; (iii) using the interpretive information, filtering those identified noun phrases to exclude noun phrases that can be identified to be independent in nature; (iv) identifying a set of potentially anaphoric noun phrases occurring in the written work; (v) following said identifying a set of potentially anaphoric noun phrases, recognizing cases of unambiguous coreferences in the set of potentially anaphoric noun phrases, said recognizing associating a noun phrase with an antecedent for each case; (vi) following said recognizing, identifying coreference combinations for unrecognized noun phrases from the set of potentially anaphoric noun phrases, each coreference combination including an unrecognized noun phrase and a potential antecedent; (vii) applying knowledge sources including a plurality of general knowledge sources and contextual role knowledge sources to the coreference combinations, wherein the contextual role knowledge sources include events and a manner of participation in the events to identify relatedness for the coreference combination at a thematic role level, the contextual role knowledge sources including lexical expectations and semantic expectations to resolve the coreference combination by comparing the lexical expectations and the semantic expectations of the coreference combination to the antecedent, said applying producing evidentiary values for the coreference combinations; (viii) applying a factor to each of the produced evidentiary values to favor more credible knowledge sources; (ix) for each unrecognized noun phrase, applying a probabilistic model to the produced evidentiary values associated with the noun phrase; and (x) for each application of the probabilistic model to an unrecognized noun phrase, selecting either an antecedent for that unrecognized noun phrase, if the coreference of that antecedent has a corresponding evidentiary value above a selected threshold value, or no antecedent otherwise.
-
Specification