METHOD AND SYSTEM FOR CLEANSING SEQUENCE-BASED DATA AT QUERY TIME
First Claim
1. A computer-implemented method of cleansing anomalies from sequence-based data at query time, comprising:
- loading sequence-based data into a database managed by a database management system (DBMS) of a computing system, said loading being performed at a load time of said sequence-based data that precedes a query time of said sequence-based data;
receiving a cleansing rule at a cleansing rules engine of said computing system;
automatically converting, by said cleansing rules engine, said cleansing rule to a template, said template including logic to compensate for one or more anomalies in said sequence-based data;
receiving, at said query time and by a query rewrite engine of said computing system, a user query to retrieve said sequence-based data;
automatically rewriting, at said query time and by said query rewrite engine, said user query to provide a rewritten query, said automatically rewriting including applying said logic included in said template to compensate for said one or more anomalies; and
executing, at said query time, said rewritten query by said DBMS, wherein an answer provided by said executing said rewritten query is identical to a result of executing said user query on a set of data generated by an application of said cleansing rule to all of said sequence-based data.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for cleansing anomalies from sequence-based data at query time. Sequence-based data such as RFID data is loaded into a database. One or more cleansing rules are received at a cleansing rules engine. The cleansing rule engine converts the cleansing rule(s) to a template that includes logic to compensate for anomalies in the sequence-based data. A query to retrieve the sequence-based data is received by a query rewrite engine. The query rewrite engine rewrites the query by applying the template logic. The rewritten query is executed at query time. The result of the rewritten query execution is identical to the result of executing the original query on a data set generated by applying the cleansing rule to all of the sequence-based data.
-
Citations
22 Claims
-
1. A computer-implemented method of cleansing anomalies from sequence-based data at query time, comprising:
-
loading sequence-based data into a database managed by a database management system (DBMS) of a computing system, said loading being performed at a load time of said sequence-based data that precedes a query time of said sequence-based data; receiving a cleansing rule at a cleansing rules engine of said computing system; automatically converting, by said cleansing rules engine, said cleansing rule to a template, said template including logic to compensate for one or more anomalies in said sequence-based data; receiving, at said query time and by a query rewrite engine of said computing system, a user query to retrieve said sequence-based data; automatically rewriting, at said query time and by said query rewrite engine, said user query to provide a rewritten query, said automatically rewriting including applying said logic included in said template to compensate for said one or more anomalies; and executing, at said query time, said rewritten query by said DBMS, wherein an answer provided by said executing said rewritten query is identical to a result of executing said user query on a set of data generated by an application of said cleansing rule to all of said sequence-based data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for cleansing anomalies from sequence-based data at query time in a computing environment, comprising:
-
means for loading sequence-based data into a database managed by a database management system (DBMS) of a computing system, said loading being performed at a load time of said sequence-based data that precedes a query time of said sequence-based data; means for receiving a cleansing rule at a cleansing rules engine of said computing system; means for automatically converting, by said cleansing rules engine, said cleansing rule to a template, said template including logic to compensate for one or more anomalies in said sequence-based data; means for receiving, at said query time and by a query rewrite engine of said computing system, a user query to retrieve said sequence-based data; means for automatically rewriting, at said query time and by said query rewrite engine, said user query to provide a rewritten query, said means for automatically rewriting including means for applying said logic included in said template to compensate for said one or more anomalies; and means for executing, at said query time, said rewritten query by said DBMS, wherein an answer provided by said means for executing said rewritten query is identical to a result provided by a means for executing said user query on a set of data generated by an application of said cleansing rule to all of said sequence-based data. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A computer-implemented method of cleansing anomalies from sequence-based data at query time via a rewrite of a query with respect to multiple cleansing rules, comprising:
-
loading sequence-based data into a database managed by a database management system (DBMS) of a computing system, said sequence-based data including one or more anomalies; receiving a plurality of cleansing rules C1, . . . , Cn at a cleansing rules engine of said computing system; receiving, by a query rewrite engine of said computing system, a user query Q to retrieve said sequence-based data; automatically rewriting said user query by said query rewrite engine to provide a rewritten query; executing said rewritten query by said DBMS, said executing including generating cleansed data from said sequence-based data, said cleansed data not including said one or more anomalies, wherein said automatically rewriting includes; for each cleansing rule Ci of said plurality of cleansing rules C1, . . . , Cn, performing a first loop that includes; for each context reference X of one or more context references included in a pattern of said cleansing rule Ci on a relational table R, performing a second loop that includes; setting a correlation condition cr to a list of one or more conjuncts, said one or more conjuncts comprising at least one of;
one or more explicit conjuncts included in a condition of said cleansing rule Ci and referring to said context reference X and one or more implied conjuncts, each implied conjunct being on a cluster key of said relational table R or a sequence key of said relational table R, wherein said correlation condition cr is a correlation condition between said context reference X and T, said T being a target reference included in said pattern,if said context reference X is a position-based context reference, then retaining in said one or more conjuncts of said correlation condition cr only position-preserving implied conjuncts, binding s to said target reference X, wherein said s is a query condition on said relational table R and is included in said user query Q, running a transitivity analysis between said correlation condition cr and said query condition s, determining d, said d being a set including any conjunct of said correlation condition cr that refers only to context reference X, and if set d is not empty, adding set d to a context condition cci, otherwise setting said context condition cci to an empty set and breaking out of said second loop, wherein said context condition cci defines a context set for context reference X, and if said context condition cci is said empty set, breaking out of said first loop, and performing a join-back algorithm to generate said rewritten query; and if no context condition cci is said empty set, performing the following; computing an overall context condition cc as cc1∥
cc2 . . . ∥
ccn,computing an expanded condition ec as s∥
cc, wherein said s is a query condition of said user query Q,simplifying said query condition s to an optimized query condition s′
, said simplifying including setting said optimized query condition s′
equal to s−
cc, andcomputing an expanded rewrite query Qe as said rewritten query, said computing said expanded rewrite query Qe including utilizing an expression σ
s′
(Φ
Cn . . . Φ
C1(σ
ec(R))), wherein each Φ
Ci(σ
ec(R)) of said Φ
Cn . . . Φ
C1(σ
ec(R)) is a result of applying said cleansing rule Ci on a data set σ
ec(R), wherein said data set σ
ec(R) is a result of directly pushing said expanded condition ec to said relational table R and cleansing data of said relational table R selected by said expanded condition ec.
-
-
22. A system for cleansing anomalies from sequence-based data at query time via a rewrite of a query with respect to multiple cleansing rules in a computing environment, comprising:
-
means for loading sequence-based data into a database managed by a database management system (DBMS) of a computing system, said sequence-based data including one or more anomalies; means for receiving a plurality of cleansing rules C1, . . . , Cn at a cleansing rules engine of said computing system; means for receiving, by a query rewrite engine of said computing system, a user query Q to retrieve said sequence-based data; means for automatically rewriting said user query by said query rewrite engine to provide a rewritten query; means for executing said rewritten query by said DBMS, said means for executing including means for generating cleansed data from said sequence-based data, said cleansed data not including said one or more anomalies, wherein said means for automatically rewriting includes; means for performing a first loop for each cleansing rule Ci of said plurality of cleansing rules C1, . . . , Cn, said means for performing said first loop including; means for performing a second loop for each context reference X of one or more context references included in a pattern of said cleansing rule Ci on a relational table R, said means for performing said second loop including the following; means for setting a correlation condition cr to a list of one or more conjuncts, said one or more conjuncts comprising at least one of;
one or more explicit conjuncts included in a condition of said cleansing rule Ci and referring to said context reference X and one or more implied conjuncts, each implied conjunct being on a cluster key of said relational table R or a sequence key of said relational table R, wherein said correlation condition cr is a correlation condition between said context reference X and T, said T being a target reference included in said pattern,means for retaining in said one or more conjuncts of said correlation condition cr only position-preserving implied conjuncts if said context reference X is a position-based context reference, means for binding s to said target reference X, wherein said s is a query condition on said relational table R and is included in said user query Q, means for running a transitivity analysis between said correlation condition cr and said query condition s, means for determining d, said d being a set including any conjunct of said correlation condition cr that refers only to context reference X, means for adding set d to a context condition cci if set d is not empty, and otherwise for setting said context condition cci to an empty set and breaking out of said second loop, wherein said context condition cci defines a context set for context reference X, and means for breaking out of said first loop if said context condition cci is said empty set, and means for performing a join-back algorithm to generate said rewritten query; and means for performing the following if no context condition cci is said empty set; computing an overall context condition cc as cc1∥
cc2 . . . ∥
ccn,computing an expanded condition ec as s∥
cc, wherein said s is a query condition of said user query Q,simplifying said query condition s to an optimized query condition s′
, said simplifying including setting said optimized query condition s′
equal to s−
cc, andcomputing an expanded rewrite query Qe as said rewritten query, said computing said expanded rewrite query Qe including utilizing an expression σ
s′
(Φ
Cn . . . Φ
C1(R))), wherein each Φ
Ci(σ
ec(R)) of said Φ
Cn . . . Φ
C1(σ
ec(R)) is a result of applying said cleansing rule Ci on a data set σ
ec(R), wherein said data set σ
ec(R) is a result of directly pushing said expanded condition ec to said relational table R and cleansing data of said relational table R selected by said expanded condition ec.
-
Specification