Post-processing for identifying nonsense passages in a question answering system

US 10,169,328 B2
Filed: 05/12/2016
Issued: 01/01/2019
Est. Priority Date: 05/12/2016
Status: Active Grant

First Claim

Patent Images

1. A method, in a data processing system, for identifying nonsense passages, the method comprising:

annotating, by an annotator in a nonsense identification component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage;

counting, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts;

determining, by the metric counters component, a value for a metric based on the set of feature counts;

comparing, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold;

determining, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison;

responsive to the filter component determining the given evidence passage is a nonsense passage, sending, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and preventing the input passage from proceeding in the natural language processing pipeline; and

responsive to the filter component not determining that the input passage is a nonsense passage, passing, by the filter component, the input passage to the natural language processing pipeline.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A mechanism is provided in a data processing system for identifying nonsense passages. The mechanism annotates an input passage with linguistic features to form an annotated passage. The mechanism counts a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts. The mechanism determines a value for a metric based on the set of feature counts and compares the value for the metric to a predetermined model threshold. The mechanism identifies whether the input passage is a nonsense passage based on a result of the comparison.

Citations

15 Claims

1. A method, in a data processing system, for identifying nonsense passages, the method comprising:
- annotating, by an annotator in a nonsense identification component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage;
  
  counting, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts;
  
  determining, by the metric counters component, a value for a metric based on the set of feature counts;
  
  comparing, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold;
  
  determining, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison;
  
  responsive to the filter component determining the given evidence passage is a nonsense passage, sending, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and preventing the input passage from proceeding in the natural language processing pipeline; and
  
  responsive to the filter component not determining that the input passage is a nonsense passage, passing, by the filter component, the input passage to the natural language processing pipeline.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features.
  - 3. The method of claim 1, wherein the metric comprises a ratio of a number of instances of a first part -of -speech to a number of instances of a second part-of-speech in the input passage.
  - 4. The method of claim 1, wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system.

5. The method of claim wherein the metric and the predetermined model threshold arc defined in a policy data structure.

6. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program comprises a natural language processing pipeline configured to execute on a data processing system to cause the data processing system to process natural language, wherein the computer readable program comprises:
- an annotator in a nonsense identification component with the natural language processing pipeline configured to annotate an input passage with linguistic features to form an annotated passage;
  
  a metric counters component in the nonsense identification component configured to count a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts and determine a value for a metric based on the set of feature counts;
  
  a comparator component of the nonsense identification component configured to compare the value for the metric to a predetermined model threshold; and
  
  a filter component of the nonsense identification component configured to determine whether the input passage is a nonsense passage based on a result of the comparison;
  
  wherein the filter component is configured to send the given evidence passage to a semi-structured data pipeline and to prevent the given evidence passage from proceeding in the natural language processing pipeline responsive to the filter component determining the given evidence passage is a nonsense passage; and
  
  wherein the filter component is configured to pass the given evidence passage to the natural language processing pipeline responsive to the filter component not determining that the given evidence passage is a nonsense passage.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The computer program product of claim 6, wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features.
  - 8. The computer program product of claim 6, wherein the metric comprises a ratio of a number of instances of a first part-of-speech to a number of instances of a second part-of-speech in the input passage.
  - 9. The computer program product of claim 6, wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system.
  - 10. The computer program product of claim 6, wherein the metric and the predetermined model threshold are defined in a policy data structure.

11. An apparatus comprising:
- a processor; and
  
  a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to;
  
  annotate, by an annotator in a nonsense identification, component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage;
  
  count, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts;
  
  determine, by the metric counters component of the nonsense identification component, value for a metric based on the set of feature counts;
  
  compare, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold;
  
  determine, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison;
  
  responsive to the filter component determining the given evidence passage is a nonsense passage, send, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and prevent the input passage from proceeding in the natural language processing pipeline; and
  
  responsive to the filter component not determining that the input passage is a nonsense passage, pass, by the filter component, the input passage to the natural language processing pipeline.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The apparatus of claim 11, wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features.
  - 13. The apparatus of claim 11, wherein the metric comprises a ratio of a number of instances of a first part-of-speech to a number of instances of a second part-of-speech in the input passage.
  - 14. The apparatus of claim 11, wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system.
  - 15. The apparatus of claim 11, wherein the metric and the predetermined model threshold are defined in a policy data structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Beller, Charles E., Drzewucki, Michael, Phipps, Christopher, Summers, Kristen M., Yu, Julie T.
Primary Examiner(s)
Ries, Laurie A

Application Number

US15/152,747
Publication Number

US 20170329753A1
Time in Patent Office

964 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/367   Ontology

G06F 40/216   using statistical methods

G06F 40/30   Semantic analysis

Post-processing for identifying nonsense passages in a question answering system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Post-processing for identifying nonsense passages in a question answering system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links