Post-processing for identifying nonsense passages in a question answering system
First Claim
Patent Images
1. A method, in a data processing system, for identifying nonsense passages, the method comprising:
- annotating, by an annotator in a nonsense identification component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage;
counting, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts;
determining, by the metric counters component, a value for a metric based on the set of feature counts;
comparing, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold;
determining, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison;
responsive to the filter component determining the given evidence passage is a nonsense passage, sending, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and preventing the input passage from proceeding in the natural language processing pipeline; and
responsive to the filter component not determining that the input passage is a nonsense passage, passing, by the filter component, the input passage to the natural language processing pipeline.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism is provided in a data processing system for identifying nonsense passages. The mechanism annotates an input passage with linguistic features to form an annotated passage. The mechanism counts a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts. The mechanism determines a value for a metric based on the set of feature counts and compares the value for the metric to a predetermined model threshold. The mechanism identifies whether the input passage is a nonsense passage based on a result of the comparison.
-
Citations
15 Claims
-
1. A method, in a data processing system, for identifying nonsense passages, the method comprising:
-
annotating, by an annotator in a nonsense identification component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage; counting, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts; determining, by the metric counters component, a value for a metric based on the set of feature counts; comparing, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold; determining, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison; responsive to the filter component determining the given evidence passage is a nonsense passage, sending, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and preventing the input passage from proceeding in the natural language processing pipeline; and responsive to the filter component not determining that the input passage is a nonsense passage, passing, by the filter component, the input passage to the natural language processing pipeline. - View Dependent Claims (2, 3, 4)
-
-
5. The method of claim wherein the metric and the predetermined model threshold arc defined in a policy data structure.
-
6. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program comprises a natural language processing pipeline configured to execute on a data processing system to cause the data processing system to process natural language, wherein the computer readable program comprises:
-
an annotator in a nonsense identification component with the natural language processing pipeline configured to annotate an input passage with linguistic features to form an annotated passage; a metric counters component in the nonsense identification component configured to count a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts and determine a value for a metric based on the set of feature counts; a comparator component of the nonsense identification component configured to compare the value for the metric to a predetermined model threshold; and a filter component of the nonsense identification component configured to determine whether the input passage is a nonsense passage based on a result of the comparison;
wherein the filter component is configured to send the given evidence passage to a semi-structured data pipeline and to prevent the given evidence passage from proceeding in the natural language processing pipeline responsive to the filter component determining the given evidence passage is a nonsense passage; and
wherein the filter component is configured to pass the given evidence passage to the natural language processing pipeline responsive to the filter component not determining that the given evidence passage is a nonsense passage. - View Dependent Claims (7, 8, 9, 10)
-
-
11. An apparatus comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; annotate, by an annotator in a nonsense identification, component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage; count, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts; determine, by the metric counters component of the nonsense identification component, value for a metric based on the set of feature counts; compare, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold; determine, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison; responsive to the filter component determining the given evidence passage is a nonsense passage, send, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and prevent the input passage from proceeding in the natural language processing pipeline; and responsive to the filter component not determining that the input passage is a nonsense passage, pass, by the filter component, the input passage to the natural language processing pipeline. - View Dependent Claims (12, 13, 14, 15)
-
Specification