System for identifying anomalies by automatically generating and analyzing a structure
First Claim
1. A system for automatically analyzing a data model and identifying anomalies within the data model, the system comprising:
- a memory device comprising computer-readable program code;
a communication device; and
a processing device operatively coupled to the memory device and the communication device, wherein the processing device is configured to execute the computer-readable program code to;
based on input received from a graphical user interface, retrieve data comprising a plurality of elements and a plurality of tables, the data retrieved from a database;
identify elements within the plurality of tables as facts or dimensions by analyzing the elements and the plurality of tables based on a data type corresponding to the elements, a column name corresponding to the elements, and one or more constraints corresponding to the elements within the plurality of tables, wherein identifying the elements comprises;
identifying when the elements comprise a qualitative non-numerical data type and identifying the elements as the dimensions, and identifying when the elements have a quantitative numerical type and identifying the elements as the facts;
identifying when the column name of the elements meet a list of terms stored as associated with the facts or the dimensions and identifying the elements as the facts or the dimensions based on the lists of terms; and
identifying when the one or more constraints comprise rules that limit the elements to entries that are the facts or the dimensions based on the rules and identifying the elements as the facts or the dimensions based on the rules;
identify each of the tables associated with the data as either a fact table or a dimension table based on identifying the elements within the tables as the facts or the dimensions, wherein the plurality of tables comprises at least one fact table and at least one dimension table;
determine that a set of elements within a single column of a first dimension table is a primary key for the dimension table by determining that each element of the set is unique to one another;
identify a reference key corresponding to the primary key;
based on identifying the reference key corresponding to the primary key, determine at least one relationship between at least two tables;
generate the data model of the data retrieved from the database;
identify data anomalies from the at least one relationship, wherein data anomalies are ragged hierarchies, many to many relationships, and double counting;
generate a data analysis report comprising;
generating a list of columns of the tables associated with the data model, a list of other tables that share at least one column with the at least one dimension table, one or more lists of data anomalies, and a list of parents and children, wherein a child is a column associated with the reference key and a parent is a column associated with the primary key;
compiling the list of the columns, the list of other tables, the one or more lists of data anomalies, and the list of parents and children; and
present the data analysis report to a user.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention are directed to a system, method, or computer program product for analyzing a data model and identifying anomalies within the data model. Data stored in a database is transformed into a data model by the invention, wherein the invention analyzes the data and generates relationships between tables and information contained within to generate the model. The data model itself is then analyzed in order to accurately identify anomalies such as ragged hierarchies, many-to-many issues, and/or double counting issues present in the data model which might affect the data integrity. Upon analyzing the data model, the invention compiles lists of discovered information and presents the results to a user in the form of a data analysis report.
29 Citations
20 Claims
-
1. A system for automatically analyzing a data model and identifying anomalies within the data model, the system comprising:
-
a memory device comprising computer-readable program code; a communication device; and a processing device operatively coupled to the memory device and the communication device, wherein the processing device is configured to execute the computer-readable program code to; based on input received from a graphical user interface, retrieve data comprising a plurality of elements and a plurality of tables, the data retrieved from a database; identify elements within the plurality of tables as facts or dimensions by analyzing the elements and the plurality of tables based on a data type corresponding to the elements, a column name corresponding to the elements, and one or more constraints corresponding to the elements within the plurality of tables, wherein identifying the elements comprises; identifying when the elements comprise a qualitative non-numerical data type and identifying the elements as the dimensions, and identifying when the elements have a quantitative numerical type and identifying the elements as the facts; identifying when the column name of the elements meet a list of terms stored as associated with the facts or the dimensions and identifying the elements as the facts or the dimensions based on the lists of terms; and identifying when the one or more constraints comprise rules that limit the elements to entries that are the facts or the dimensions based on the rules and identifying the elements as the facts or the dimensions based on the rules; identify each of the tables associated with the data as either a fact table or a dimension table based on identifying the elements within the tables as the facts or the dimensions, wherein the plurality of tables comprises at least one fact table and at least one dimension table; determine that a set of elements within a single column of a first dimension table is a primary key for the dimension table by determining that each element of the set is unique to one another; identify a reference key corresponding to the primary key; based on identifying the reference key corresponding to the primary key, determine at least one relationship between at least two tables; generate the data model of the data retrieved from the database; identify data anomalies from the at least one relationship, wherein data anomalies are ragged hierarchies, many to many relationships, and double counting; generate a data analysis report comprising; generating a list of columns of the tables associated with the data model, a list of other tables that share at least one column with the at least one dimension table, one or more lists of data anomalies, and a list of parents and children, wherein a child is a column associated with the reference key and a parent is a column associated with the primary key; compiling the list of the columns, the list of other tables, the one or more lists of data anomalies, and the list of parents and children; and present the data analysis report to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for automatically analyzing a data model and identifying anomalies within the data model, the computer program product comprising at least one non-transitory computer-readable medium having computer-readable program code portions embodied therein, the computer-readable program code portions comprising:
-
an executable portion configured for retrieving data comprising a plurality of elements and a plurality of tables, the data retrieved from a database based on input received from a graphical user interface; an executable portion configured for identifying elements within the plurality of tables as facts or dimensions by analyzing the elements and the plurality of tables based on a data type corresponding to the elements, a column name corresponding to the elements, and one or more constraints corresponding to the elements within the plurality of tables, wherein identifying the elements comprises; identifying when the elements comprise a qualitative non-numerical data type and identifying the elements as the dimensions, and identifying when the elements have a quantitative numerical type and identifying the elements as the facts; identifying when the column name of the elements meet a list of terms stored as associated with the facts or the dimensions and identifying the elements as the facts or the dimensions based on the lists of terms; and identifying when the one or more constraints comprise rules that limit the elements to entries that are the facts or the dimensions based on the rules and identifying the elements as the facts or the dimensions based on the rules; an executable portion configured for identifying each of the tables associated with the data as either a fact table or a dimension table based on identifying the elements within the tables as the facts or the dimensions, wherein the plurality of tables comprises at least one fact table and at least one dimension table; an executable portion configured for determining that a set of elements within a single column of a first dimension table is a primary key for the dimension table by determining that each element of the set is unique to one another; an executable portion configured for identifying a reference key corresponding to the primary key; an executable portion configured for determining at least one relationship between at least two tables based on identifying the reference key corresponding to the primary key; an executable portion configured for generating the data model of the data retrieved from the database; an executable portion configured for identifying data anomalies, wherein data anomalies are ragged hierarchies, many to many relationships, and double counting; an executable portion configured for generating a data analysis report comprising; generating a list of columns of the tables associated with the data model, a list of other tables that share at least one column with the at least one dimension table, one or more lists of data anomalies, and a list of parents and children, wherein a child is a column associated with the reference key and a parent is a column associated with the primary key; compiling the list of the columns, the list of other tables, the one or more lists of data anomalies, and the list of parents and children; and an executable portion configured for presenting the data analysis report to a user. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented method for automatically analyzing a data model and identifying anomalies within the data model, the method comprising:
-
retrieving data comprising a plurality of elements and a plurality of tables, the data retrieved from a database based on input received from a graphical user interface; identifying elements within the plurality of tables as facts or dimensions by analyzing the elements and the plurality of tables based on a data type corresponding to the elements, a column name corresponding to the elements, and one or more constraints corresponding to the element within the plurality of tables, wherein identifying the elements comprises; identifying when the elements comprise a qualitative non-numerical data type and identifying the elements as the dimensions, and identifying when the elements have a quantitative numerical type and identifying the elements as the facts; identifying when the column name of the elements meet a list of terms stored as associated with the facts or the dimensions and identifying the elements as the facts or the dimensions based on the lists of terms; and identifying when the one or more constraints comprise rules that limit the elements to entries that are the facts or the dimensions based on the rules and identifying the elements as the facts or the dimensions based on the rules; identifying each of the tables associated with the data as either a fact table or a dimension table based on identifying the elements within the tables as the facts or the dimensions, wherein the plurality of tables comprises at least one fact table and at least one dimension table; determining that a set of elements within a single column of a first dimension table is a primary key for the dimension table by determining that each element of the set is unique to one another; identifying a reference key corresponding to the primary key; determining at least one relationship between at least two tables based on identifying the reference key corresponding to the primary key; generating the data model of the data retrieved from the database; identifying data anomalies, wherein data anomalies are ragged hierarchies, many to many relationships, and double counting; generating a data analysis report comprising; generating a list of columns of the tables associated with the data model, a list of other tables that share at least one column with the at least one dimension table, one or more lists of data anomalies, and a list of parents and children, wherein a child is a column associated with the reference key and a parent is a column associated with the primary key; compiling the list of the columns, the list of other tables, the one or more lists of data anomalies, and the list of parents and children; and presenting the data analysis report to a user. - View Dependent Claims (18, 19, 20)
-
Specification