DATA QUALITY ISSUE DETECTION THROUGH ONTOLOGICAL INFERENCING
First Claim
1. A method for use in detecting data quality issues in one or more instances of an incoming set of data, the method comprising:
- determining a scope of an incoming data set to be receivable at a processing engine;
obtaining a domain ontology that includes a plurality of TBox statements representing a desired state of the incoming data set;
mapping, using a processing engine, the incoming data set to the domain ontology; and
for each instance of the incoming data set;
determining, using the processing engine, whether the instance can be inferenced into an anticipated TBox statement collection of the domain ontology; and
assessing, using the processing engine, whether the instance has at least one data quality issue based on an outcome of the determining.
7 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods (e.g., utilities) for use in providing automated data quality detection that can be used multiple times across various domains and across multiple data quality spheres. A data structure or schema of an incoming data set is initially mapped to a desired data or knowledge state in a domain ontology made up of a number of TBox statements. Data quality issues in the incoming data set can then be detected as instances of the incoming data set are attempted to be inferenced against or otherwise matched to anticipated TBox statements of the domain ontology.
81 Citations
26 Claims
-
1. A method for use in detecting data quality issues in one or more instances of an incoming set of data, the method comprising:
-
determining a scope of an incoming data set to be receivable at a processing engine; obtaining a domain ontology that includes a plurality of TBox statements representing a desired state of the incoming data set; mapping, using a processing engine, the incoming data set to the domain ontology; and for each instance of the incoming data set; determining, using the processing engine, whether the instance can be inferenced into an anticipated TBox statement collection of the domain ontology; and assessing, using the processing engine, whether the instance has at least one data quality issue based on an outcome of the determining. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for use in detecting data quality issues in one or more instances of an incoming set of data, the system comprising:
-
a processing module; and a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; determine a scope of an incoming data set to be receivable at a processing engine; obtain a domain ontology that includes a plurality of TBox statements representing a desired state of the incoming data; map the incoming data set to the domain ontology; and for each instance of the incoming data set; determine whether the instance can be inferred into an anticipated TBox statement collection of the domain ontology; and assess whether the instance has at least one data quality issue based on an outcome of the determining. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification