Method and system for detecting deviations in data tables
First Claim
1. A computerized method for detecting deviations in a data table comprising a plurality of records and a plurality of columns, said method comprising the steps of:
- (1) selecting a column as a classification column;
(2) executing a classification method for calculating a classification tree with respect to said classification column, whereby each edge of said classification tree is associated with a predicate, whereby a leaf node of said classification tree is associated with a leaf record set comprising a subset of records for which a class predicate comprising all predicates along a path from a root node of said classification tree to said leaf node evaluates to TRUE, and whereby said leaf node is associated with a leaf label representing an expected value in said classification column of said leaf record set; and
(3) determining from said leaf record set all records deviating with respect to said classification column from said leaf label as a deviation set.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for automatically detecting deviations in a data table comprising a multitude of records and a multitude of columns. A column of the data table is selected as a classification column and a classification tree is calculated with respect to the classification column. Each edge of the classification tree is associated with a predicate. The leaf nodes of the classification tree are associated with a leaf record set comprising the subset of records for which the class predicate comprising all predicates along a path from a root node of the classification tree to the leaf nodes evaluates to TRUE. Leaf nodes are associated with a leaf label representing an expected value in the classification column for the corresponding leaf record sets. From the leaf record sets all records deviating with respect to the corresponding classification column from the leaf label are determined as deviation sets.
-
Citations
20 Claims
-
1. A computerized method for detecting deviations in a data table comprising a plurality of records and a plurality of columns, said method comprising the steps of:
-
(1) selecting a column as a classification column;
(2) executing a classification method for calculating a classification tree with respect to said classification column, whereby each edge of said classification tree is associated with a predicate, whereby a leaf node of said classification tree is associated with a leaf record set comprising a subset of records for which a class predicate comprising all predicates along a path from a root node of said classification tree to said leaf node evaluates to TRUE, and whereby said leaf node is associated with a leaf label representing an expected value in said classification column of said leaf record set; and
(3) determining from said leaf record set all records deviating with respect to said classification column from said leaf label as a deviation set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification