×

Data quality issue detection through ontological inferencing

  • US 10,460,238 B2
  • Filed: 10/11/2011
  • Issued: 10/29/2019
  • Est. Priority Date: 10/11/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method for use in detecting data quality issues in one or more instances of an incoming set of data, the method comprising:

  • determining a scope of an incoming data set to be receivable at a processor, wherein said determining a scope of the incoming data set comprises automatically perusing a plurality of instances of data structures comprising the incoming data set, and wherein each instance includes one or more entries;

    obtaining, based on the determined scope, a domain ontology that includes a plurality of TBox statement collections that collectively comprise metadata describing desired or acceptable properties of data corresponding to the determined scope, and wherein the metadata in each TBox statement collection describes a) a plurality of data types, b) at least one format that each data type should have, and c) an indication of whether or not each format is required in order for the data to be considered compliant with the TBox statement collection, wherein at least one of the formats of at least one of the data types is indicated as being required in order for the data to be considered compliant with the TBox statement collection;

    mapping, using the processor, the incoming data set to the domain ontology, wherein said mapping comprises linking specific data structures in the incoming data set to particular TBox statement collections of the obtained domain ontology; and

    for each instance of the incoming data set;

    identifying an anticipated TBox statement collection of the plurality of TBox statement collections of the domain ontology;

    ascertaining, using the processor, whether the instance can be inferenced into the anticipated TBox statement collection of the domain ontology, wherein the instance can be inferenced into the anticipated TBox statement collection when the instance comprises an ABox statement that is compliant with the anticipated TBox statement collection, wherein the instance comprises an ABox statement that is compliant with the anticipated TBox statement collection when at least one data structure in the instance is the at least one of the data types having the at least one of the required formats;

    determining, by the processor, that the instance is free of at least one data quality issue responsive to the instance being inferenced into the anticipated TBox statement collection; and

    determining, by the processor, that the instance has at least one data quality issue responsive to the instance not being inferenced into the anticipated TBox statement collection, wherein when the processor determines that the instance has at least one data quality issue, the method further includes ascertaining whether the instance can be inferenced into any other TBox statement collections of the domain ontology, and wherein the processor;

    determines that the at least one data quality issue comprises a structural and/or formatting issue associated with the instance when it is ascertained that the instance cannot be inferenced into any other TBox statement collections of the domain ontology; and

    determines that the at least one data quality issue comprises a labeling issue associated with the instance when it is ascertained that the instance can be inferenced into another TBox statement collection of the domain ontology.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×