Determining and validating provenance data in data stream processing system
First Claim
1. A method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, the method comprising the steps of:
- obtaining a data stream of input elements and a data stream of output elements associated with at least one processing element of the plurality of processing elements, wherein the data stream of input elements are obtained from at least one streaming data source, and wherein the data stream of output elements are generated by the at least one processing element in response to the data stream of input elements;
computing one or more intervals for the at least one processing element, wherein the one or more intervals are computed using data representing observations of associations between the input elements and the output elements of the at least one processing element, wherein, for a given one of the computed intervals, one or more particular input elements contained within the given computed interval are determined to have contributed to a particular output element; and
using the computed one or more intervals to determine a dependency function that enables a provenance of the particular output element to be determined in terms of the one or more particular input elements.
5 Assignments
0 Petitions
Accused Products
Abstract
Techniques are disclosed for determining and validating provenance data in such data stream processing systems. For example, a method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, comprises the following steps. Input data elements and output data elements associated with at least one processing element of the plurality of processing elements are obtained. One or more intervals are computed for the processing element using data representing observations of associations between inputs elements and output elements of the processing element, wherein, for a given one of the intervals, one or more particular input elements contained within the given interval are determined to have contributed to a particular output element. In another method, intervals are specified, and then validated by comparing the specified intervals against intervals computed based on observations.
32 Citations
20 Claims
-
1. A method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, the method comprising the steps of:
-
obtaining a data stream of input elements and a data stream of output elements associated with at least one processing element of the plurality of processing elements, wherein the data stream of input elements are obtained from at least one streaming data source, and wherein the data stream of output elements are generated by the at least one processing element in response to the data stream of input elements; computing one or more intervals for the at least one processing element, wherein the one or more intervals are computed using data representing observations of associations between the input elements and the output elements of the at least one processing element, wherein, for a given one of the computed intervals, one or more particular input elements contained within the given computed interval are determined to have contributed to a particular output element; and using the computed one or more intervals to determine a dependency function that enables a provenance of the particular output element to be determined in terms of the one or more particular input elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, the method comprising the steps of:
-
obtaining input data elements and output data elements associated with at least one processing element of the plurality of processing elements, wherein the input data elements are obtained from at least one streaming data source; specifying one or more intervals for the processing element wherein, for a given one of the intervals, one or more particular input elements contained within the given interval are believed to have contributed to a particular output element thereby determining a provenance of the particular output element in terms of the one or more particular input elements; and validating the one or more specified intervals by computing one or more intervals for the processing element using data representing observations of associations between inputs elements and output elements of the processing element, and comparing the one or more specified intervals and the one or more computed intervals. - View Dependent Claims (17)
-
-
18. Apparatus for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, the apparatus comprising:
-
a memory; and a processor coupled to the memory and configured to; obtain a data stream of input elements and a data stream of output elements associated with at least one processing element of the plurality of processing elements, wherein the data stream of input elements are obtained from at least one streaming data source, and wherein the data stream of output elements are generated by the at least one processing element in response to the data stream of input elements; compute one or more intervals for the at least one processing element, wherein the one or more intervals are computed using data representing observations of associations between the input elements and the output elements of the at least one processing element, wherein, for a given one of the computed intervals, one or more particular input elements contained within the given computed interval are determined to have contributed to a particular output element; and use the computed one or more intervals to determine a dependency function that enables a provenance of the particular output element to be determined in terms of the one or more particular input elements. - View Dependent Claims (19)
-
-
20. An article of manufacture for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, the article comprising a computer readable storage medium having one or more programs embodied therewith wherein the one or more programs, when executed by a computer, perform steps of:
-
obtaining a data stream of input elements and a data stream of output elements associated with at least one processing element of the plurality of processing elements, wherein the data stream of input elements are obtained from at least one streaming data source, and wherein the data stream of output elements are generated by the at least one processing element in response to the data stream of input elements; computing one or more intervals for the at least one processing element, wherein the one or more intervals are computed using data representing observations of associations between the input elements and the output elements of the at least one processing element, wherein, for a given one of the computed intervals, one or more particular input elements contained within the given computed interval are determined to have contributed to a particular output element; and using the computed one or more intervals to determine a dependency function that enables a provenance of the particular output element to be determined in terms of the one or more particular input elements.
-
Specification