Error correction in tables using a question and answer system
First Claim
1. A method, in a data processing system comprising a processor and a memory, for performing tabular data correction in a document, the method comprising:
- configuring the processor to implement a natural language processing (NLP) system that performs natural language processing on natural language content at least by processing logical relationships in the natural language content;
responsive to receiving a natural language document with table structures and functional dependencies identified therein;
configuring an erroneous data value analysis engine to analyze a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information;
analyzing, by the erroneous data value analysis engine executed by the processor in the data processing system, a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information, wherein the portion of content comprises a table data structure present in the natural language document, wherein the erroneous sub-portion comprises a cell in the table data structure having an erroneous or missing data value, wherein the erroneous sub-portion is identified based on the erroneous sub-portion failing to conform with a regular structure associated with the portion of content within the natural language document, and wherein the regular structure is a repeatable pattern within the portion of content within the natural language document;
configuring a question generation engine to generate a semantic signature for the erroneous sub-portion;
generating, by the question generation engine executed by the processor in the data processing system, the semantic signature for the erroneous sub-portion, wherein generating the semantic signature for the erroneous sub-portion comprises;
performing, by the question generation engine, a discovery of functional dependencies between the erroneous or missing data value in the cell of the table data structure and a second portion of content, wherein the functional dependencies are indicated by the repeatable pattern;
analyzing, by the question generation engine, context information surrounding the erroneous sub-portion of content in the natural language document utilizing the functional dependencies between the erroneous or missing data value in the cell of the table data structure and a second portion of content; and
converting, by the question generation engine, the context information into a narrated statement using a narration mechanism;
configuring a Question and Answer (QA) system to generate a query based on the semantic signature and apply the query to a knowledge base to identify a candidate sub-portion of content for correcting the erroneous sub-portion;
generating, by the Question and Answer (QA) system executed by the processor in the data processing system, the query based on the semantic signature;
applying, by the QA system, the query to the knowledge base to identify the candidate sub-portion of content for correcting the erroneous sub-portion;
configuring a correction engine to correct the erroneous sub-portion to generate a corrected natural language document and store the corrected natural language document;
correcting, by the correction engine executed by the processor in the data processing system, the erroneous sub-portion using the identified candidate sub-portion of content to generate the corrected natural language document; and
storing, by the correction engine, the corrected natural language document in a storage device.
1 Assignment
0 Petitions
Accused Products
Abstract
Mechanisms are provided for performing tabular data correction in a document. The mechanisms receive a natural language document comprising a portion of content and analyze the portion of content within the natural language document to identify an erroneous sub-portion comprising an erroneous or missing item of information. The mechanisms generate a semantic signature for the erroneous sub-portion and generate a query based on the semantic signature. The mechanisms apply the query to a knowledge base to identify a candidate sub-portion of content. The mechanisms correct the erroneous sub-portion using the identified candidate sub-portion of content to generate a corrected natural language document.
63 Citations
14 Claims
-
1. A method, in a data processing system comprising a processor and a memory, for performing tabular data correction in a document, the method comprising:
-
configuring the processor to implement a natural language processing (NLP) system that performs natural language processing on natural language content at least by processing logical relationships in the natural language content; responsive to receiving a natural language document with table structures and functional dependencies identified therein; configuring an erroneous data value analysis engine to analyze a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information; analyzing, by the erroneous data value analysis engine executed by the processor in the data processing system, a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information, wherein the portion of content comprises a table data structure present in the natural language document, wherein the erroneous sub-portion comprises a cell in the table data structure having an erroneous or missing data value, wherein the erroneous sub-portion is identified based on the erroneous sub-portion failing to conform with a regular structure associated with the portion of content within the natural language document, and wherein the regular structure is a repeatable pattern within the portion of content within the natural language document; configuring a question generation engine to generate a semantic signature for the erroneous sub-portion; generating, by the question generation engine executed by the processor in the data processing system, the semantic signature for the erroneous sub-portion, wherein generating the semantic signature for the erroneous sub-portion comprises; performing, by the question generation engine, a discovery of functional dependencies between the erroneous or missing data value in the cell of the table data structure and a second portion of content, wherein the functional dependencies are indicated by the repeatable pattern; analyzing, by the question generation engine, context information surrounding the erroneous sub-portion of content in the natural language document utilizing the functional dependencies between the erroneous or missing data value in the cell of the table data structure and a second portion of content; and converting, by the question generation engine, the context information into a narrated statement using a narration mechanism; configuring a Question and Answer (QA) system to generate a query based on the semantic signature and apply the query to a knowledge base to identify a candidate sub-portion of content for correcting the erroneous sub-portion; generating, by the Question and Answer (QA) system executed by the processor in the data processing system, the query based on the semantic signature; applying, by the QA system, the query to the knowledge base to identify the candidate sub-portion of content for correcting the erroneous sub-portion; configuring a correction engine to correct the erroneous sub-portion to generate a corrected natural language document and store the corrected natural language document; correcting, by the correction engine executed by the processor in the data processing system, the erroneous sub-portion using the identified candidate sub-portion of content to generate the corrected natural language document; and storing, by the correction engine, the corrected natural language document in a storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program for performing tabular data correction in a document stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
configure the computing device to implement a natural language processing (NLP) system that performs natural language processing on natural language content at least by processing logical relationships in the natural language content; responsive to receiving a natural language document with table structures and functional dependencies identified therein; configuring an erroneous data value analysis engine within the computing device to analyze a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information; analyze, by the erroneous data value analysis engine executed by a processor in the computing device, a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information, wherein the portion of content comprises a table data structure present in the natural language document, wherein the erroneous sub-portion comprises a cell in the table data structure having an erroneous or missing data value, wherein the erroneous sub-portion is identified based on the erroneous sub-portion failing to conform with a regular structure associated with the portion of content within the natural language document, and wherein the regular structure is a repeatable pattern within the portion of content within the natural language document; configure a question generation engine within the computing device to generate a semantic signature for the erroneous sub-portion; generate, by the question generation engine executed by the processor in the computing device, the semantic signature for the erroneous sub-portion, wherein the computer readable program further causes the computing device to generate a semantic signature for the erroneous sub-portion at least by; perform, by the question generation engine, a discovery of functional dependencies between the erroneous or missing data value in the cell of the table data structure and a second portion of content, wherein the functional dependencies are indicated by the repeatable pattern; analyzing, by the question generation engine, context information surrounding the erroneous sub-portion of content in the natural language document; and converting, by the question generation engine, the context information into a narrated statement using a narration mechanism; configure a Question and Answer (OA) system within the computing device to generate a query based on the semantic signature and apply the query to a knowledge base to identify a candidate sub-portion of content for correcting the erroneous sub-portion; generate, by the Question and Answer (QA) system executed by the processor in the computing device, the query based on the semantic signature; apply, by the QA system, the query to the knowledge base to identify the candidate sub-portion of content for correcting the erroneous sub-portion; configure a correction engine within the computing device to correct the erroneous sub-portion and store the corrected natural language document; correct, by the correction engine executed by the processor in the computing device, the erroneous sub-portion using the identified candidate sub-portion of content to generate the corrected natural language document; and store, by the correction engine, the corrected natural language document in a storage device. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. An apparatus for performing tabular data correction in a document comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; configure the processor to implement a natural language processing (NLP) system that performs natural language processing on natural language content at least by processing logical relationships in the natural language content; responsive to receiving a natural language document with table structures and functional dependencies identified therein; configuring an erroneous data value analysis engine to analyze a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information; analyze, by the erroneous data value analysis engine executed by a processor in the computing device, a portion of content within a natural language document to identify an erroneous sub-portion within the natural language document comprising an erroneous or missing item of information, wherein the portion of content comprises a table data structure present in the natural language document, wherein the erroneous sub-portion comprises a cell in the table data structure having an erroneous or missing data value, wherein the erroneous sub-portion is identified based on the erroneous sub-portion failing to conform with a regular structure associated with the portion of content within the natural language document, and wherein the regular structure is a repeatable pattern within the portion of content within the natural language document; configure a question generation engine to generate a semantic signature for the erroneous sub-portion; generate, by the question generation engine executed by the processor in the computing device, the semantic signature for the erroneous sub-portion, wherein the computer readable program further causes the computing device to generate a semantic signature for the erroneous sub-portion at least by; perform, by the question generation engine, a discovery of functional dependencies between the erroneous or missing data value in the cell of the table data structure and a second portion of content, wherein the functional dependencies are indicated by the repeatable pattern; analyzing, by the question generation engine, context information surrounding the erroneous sub-portion of content in the natural language document; and converting, by the question generation engine, the context information into a narrated statement using a narration mechanism; configure a Question and Answer (QA) system to generate a query based on the semantic signature and apply the query to a knowledge base to identify a candidate sub-portion of content for correcting the erroneous sub-portion; generate, by the Question and Answer (QA) system executed by the processor in the computing device, the query based on the semantic signature; apply, by the QA system, the query to the knowledge base to identify the candidate sub-portion of content for correcting the erroneous sub-portion; configure a correction engine to correct the erroneous sub-portion and store the corrected natural language document; correct, by the correction engine executed by the processor in the computing device, the erroneous sub-portion using the identified candidate sub-portion of content to generate the corrected natural language document; and store, by the correction engine, the corrected natural language document in a storage device.
-
Specification