DATA PROVENANCE AND DATA PEDIGREE TRACKING
First Claim
Patent Images
1. A method comprising:
- in response to a request from a processing component to access source data,a data identifier (ID) correlator generating a source data ID, wherein the data ID correlator is loaded in a storage component; and
transmitting the source data in association with the source data ID to the processing component;
the processing component processing the source data to generate result data;
a process ID correlator that is loaded in the processing component,generating a process ID;
transmitting the result data in association with the process ID and the source data ID to a streaming manager;
transmitting the process ID in association with the processing component ID to the streaming manager; and
the streaming manager linking the association of the result data with the process ID and the source data ID with the association of the processing component ID with the process ID.
1 Assignment
0 Petitions
Accused Products
Abstract
A data provenance and pedigree tracking system may collect, store, and process monitoring data collected by correlators. Monitoring data collected by correlators are events that associate data pedigree, usage rules, and provenance events. Data monitoring may be performed on the data processing and storage functions invoked when performing data analytics for example. The system can determine, maintain and persist association among components, events, rules etc. that contributed to generating a data object result. For example, a data provenance and pedigree tracking system can calculate the total cost of processing the data by adding the processing cost of each component.
15 Citations
20 Claims
-
1. A method comprising:
-
in response to a request from a processing component to access source data, a data identifier (ID) correlator generating a source data ID, wherein the data ID correlator is loaded in a storage component; and transmitting the source data in association with the source data ID to the processing component; the processing component processing the source data to generate result data; a process ID correlator that is loaded in the processing component, generating a process ID; transmitting the result data in association with the process ID and the source data ID to a streaming manager; transmitting the process ID in association with the processing component ID to the streaming manager; and the streaming manager linking the association of the result data with the process ID and the source data ID with the association of the processing component ID with the process ID. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more non-transitory machine-readable storage media comprising program code for managing data, the program code to:
-
in response to a request from a processing component to access source data, generate a source data ID; and transmit the source data in association with the source data ID to the processing component; process the source data to generate result data; generate a process ID; transmit the result data in association with the process ID and the source data ID to a streaming manager; transmit the process ID in association with the processing component ID to the streaming manager; and link, within relational tables of the streaming manager, the association of the result data with the process ID and the source data ID with the association of the processing component ID with the process ID. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. An apparatus comprising:
-
a processor; and a machine-readable medium having program code executable by the processor to cause the apparatus to, in response to a request from a processing component to access source data, generate a source data ID; and transmit the source data in association with the source data ID to the processing component; process the source data to generate result data; generate a process ID; transmit the result data in association with the process ID and the source data ID to a streaming manager; transmit the process ID in association with the processing component ID to the streaming manager; and link, within the streaming manager, the association of the result data with the process ID and the source data ID with the association of the processing component ID with the process ID. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification