Identifying and resolving data quality issues amongst information stored across multiple data sources
First Claim
1. A system comprising:
- one or more processors; and
a memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising;
receiving a query that identifies an object;
collecting, based at least in part on the query, first data associated with the object from a first data source, the first data including an attribute defined for the object and the first data source containing the first data associated with the object for first service calls received from a first group of devices;
collecting, based at least in part on the query, second data associated with the object from a second data source, the second data including the attribute defined for the object and the second data source containing the second data associated with the object for second service calls received from a second group of devices, the second data different than the first data;
comparing the first data to the second data to identify a data quality issue associated with the attribute, wherein the data quality issue includes one of;
a first type of data quality issue wherein the attribute collected from the first data source contains a first attribute value and the attribute collected from the second data source contains a missing attribute value;
a second type of data quality issue wherein the attribute collected from the first data source contains the first attribute value that is inconsistent with a second attribute value of the attribute collected from the second data source;
ora third type of data quality issue wherein the attribute collected from the first data source contains the first attribute that is untranslated to a target language;
causing a graphical user interface to be output, the graphical user interface visually distinguishing the attribute associated with the data quality issue from other attributes defined for the object that are not associated with a data quality issue and the graphical user interface providing an option to resolve the data quality issue, wherein the graphical user interface visually distinguishes the attribute from other attributes by presenting the attribute as a first graphical element having a first color and presenting at least one of the other attributes as a second graphical element having a second color that is different than the first color;
receiving, based at least in part on a user selection of the option, an instruction to resolve the data quality issue; and
taking an action to resolve the data quality issue based on the instruction, wherein the action includes one of;
copying the first attribute value to the attribute collected from the second data source that contains the missing attribute value in an event the data quality issue is of the first type of data quality issue;
orreplacing the second attribute value with the first attribute value for the attribute collected from the second data source in an event the data quality issue is of the second type of data quality issue.
1 Assignment
0 Petitions
Accused Products
Abstract
The techniques described herein are directed to identifying data quality issues within information stored across multiple different data sources. For instance, the data quality issues can comprise missing values, inconsistent values, and un-translated values. Once identified, the techniques implement actions to resolve the data quality issues so that consumption or use of the information stored is improved. In at least one example, the identification and resolution of a data quality issue can be implemented in response to receiving a query that identifies an object. Based on the query, the system can collect values, from the multiple different sources, for attributes that have been defined for an item. The system can use algorithms (e.g., a comparison algorithm) to identify a data quality issue and can output a graphical user interface that visually distinguishes between attributes with a data quality issue and attributes without a data quality issue.
-
Citations
20 Claims
-
1. A system comprising:
-
one or more processors; and a memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising; receiving a query that identifies an object; collecting, based at least in part on the query, first data associated with the object from a first data source, the first data including an attribute defined for the object and the first data source containing the first data associated with the object for first service calls received from a first group of devices; collecting, based at least in part on the query, second data associated with the object from a second data source, the second data including the attribute defined for the object and the second data source containing the second data associated with the object for second service calls received from a second group of devices, the second data different than the first data; comparing the first data to the second data to identify a data quality issue associated with the attribute, wherein the data quality issue includes one of; a first type of data quality issue wherein the attribute collected from the first data source contains a first attribute value and the attribute collected from the second data source contains a missing attribute value; a second type of data quality issue wherein the attribute collected from the first data source contains the first attribute value that is inconsistent with a second attribute value of the attribute collected from the second data source;
ora third type of data quality issue wherein the attribute collected from the first data source contains the first attribute that is untranslated to a target language; causing a graphical user interface to be output, the graphical user interface visually distinguishing the attribute associated with the data quality issue from other attributes defined for the object that are not associated with a data quality issue and the graphical user interface providing an option to resolve the data quality issue, wherein the graphical user interface visually distinguishes the attribute from other attributes by presenting the attribute as a first graphical element having a first color and presenting at least one of the other attributes as a second graphical element having a second color that is different than the first color; receiving, based at least in part on a user selection of the option, an instruction to resolve the data quality issue; and taking an action to resolve the data quality issue based on the instruction, wherein the action includes one of; copying the first attribute value to the attribute collected from the second data source that contains the missing attribute value in an event the data quality issue is of the first type of data quality issue;
orreplacing the second attribute value with the first attribute value for the attribute collected from the second data source in an event the data quality issue is of the second type of data quality issue. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
collecting, from multiple different data sources, a plurality of attribute values for an individual attribute that is associated with an object, wherein each data source includes a plurality of attributes associated with the object, at least some of which contain attribute values; identifying, by one or more processors, a first data quality issue associated with the individual attribute, the first data quality issue comprising one of; a first type of data quality issue wherein the individual attribute contains a missing attribute value; a second type of data quality issue wherein the individual attribute contains a first attribute value that is inconsistent with a second attribute value, the first attribute value is associated with a first data source and the second attribute value is associated with a second data source that is different than the first data source;
ora third type of data quality issue wherein the individual attribute contains an untranslated attribute value; causing a graphical user interface to be output, the graphical user interface visually distinguishing the individual attribute from at least one other attribute of the plurality of attributes associated with the object that is without the first data quality issue, the graphical user interface presenting the individual attribute as a first graphical element having a first color and presenting the one other attribute as a second graphical element having a second color that is different than the first color; receiving an instruction to resolve the first data quality issue; determining a most commonly occurring attribute value of the plurality of attribute values; and implementing an action, based at least in part on the instruction, to resolve the first data quality issue, the action comprising copying the most commonly occurring attribute value to an instance of the individual attribute that contains the missing attribute value. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more processors; and a memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising; establishing a first rule that defines; one or more objects or an object category a condition that constitutes a data quality issue for an attribute contained by data sources, the attribute common to the one or more objects or the object category, the data quality issue comprising one of; a first type of data quality issue wherein the attribute contains a missing attribute value; a second type of data quality issue wherein the attribute contains a first attribute value that is inconsistent with a second attribute value, the first attribute value is associated with a first data source and the second attribute value is associated with a second data source that is different than the first data source;
ora third type of data quality issue wherein the attribute contains an untranslated attribute value; and an action to change the attribute to resolve the data quality issue; determining attribute values contained in a plurality of instances of the attribute across the data sources; detecting an occurrence of the condition that constitutes the data quality issue; determining that the action defined by the first rule conflicts with a second rule; determining a priority order that indicates a first author of the first rule has priority over a second author of the second rule; and implementing the action to change the attribute to resolve the data quality issue based at least in part on detecting the occurrence and the priority order. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification