Automatically executing tasks and configuring access control lists in a data transformation system
First Claim
1. A method comprising:
- retrieving a first configuration file, the first configuration file comprising;
a plurality of different data transformation tasks, each of the tasks denoted using a task identifier that identifies a particular task to apply to a set of input data and associated with task-specific criteria for execution of the particular task;
a schema definition for a dataset, wherein the schema definition defines a plurality of columns and a data type for each column of the plurality of columns;
retrieving a second configuration file, the second configuration file comprising an access control list that defines one or more access control permissions for the dataset;
receiving an input file that includes an input dataset;
in response to receiving the input file, based on reading the first configuration file and reading the second configuration file;
applying the plurality of different data transformation tasks to the input dataset based upon the first configuration file to generate an output file including an output dataset that is formatted differently from the input dataset, wherein the output dataset is formatted according to the task-specific criteria and aligns with the plurality of columns as defined by the schema definition of the first configuration file;
determining output access control permissions for the output dataset based on the access control list of the second configuration file;
wherein the method is performed using one or more processors.
8 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system or process is programmed or configured to use a configuration file to specify one or more tasks to apply to raw ingested data. A task may be a sequence of instructions programmed or configured to format raw ingested data into a dataset in a CSV format. Examples of tasks may include: a parser to parse Cobol data into a CSV, a parser to parse XML into a CSV, a parser to parse text using fixed-width fields to a CSV, a parser to parse files in a zip archive into a CSV, a regular expression search/replace function, or formatting logic to remove lines or blank lines from raw ingested data. In one embodiment, the configuration file may specify a schema definition for a task to use for generating a dataset. In one embodiment, the configuration file may also include one or more access control list (ACL) definitions for the generated dataset. In one embodiment, the building of datasets using the configuration file is automated, for example, on a nightly basis.
108 Citations
21 Claims
-
1. A method comprising:
-
retrieving a first configuration file, the first configuration file comprising; a plurality of different data transformation tasks, each of the tasks denoted using a task identifier that identifies a particular task to apply to a set of input data and associated with task-specific criteria for execution of the particular task; a schema definition for a dataset, wherein the schema definition defines a plurality of columns and a data type for each column of the plurality of columns; retrieving a second configuration file, the second configuration file comprising an access control list that defines one or more access control permissions for the dataset; receiving an input file that includes an input dataset; in response to receiving the input file, based on reading the first configuration file and reading the second configuration file; applying the plurality of different data transformation tasks to the input dataset based upon the first configuration file to generate an output file including an output dataset that is formatted differently from the input dataset, wherein the output dataset is formatted according to the task-specific criteria and aligns with the plurality of columns as defined by the schema definition of the first configuration file; determining output access control permissions for the output dataset based on the access control list of the second configuration file; wherein the method is performed using one or more processors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. One or more non-transitory computer-readable media storing instructions, which when executed by one or more processors cause:
-
retrieving a first configuration file, the first configuration file comprising; a plurality of different data transformation tasks, each of the tasks denoted using a task identifier that identifies a particular task to apply to a set of input data and associated with task-specific criteria for execution of the particular task; a schema definition for a dataset, wherein the schema definition defines a plurality of columns and a data type for each column of the plurality of columns; retrieving a second configuration file, the second configuration file comprising an access control list that defines one or more access control permissions for the dataset; receiving an input file that includes an input dataset; in response to receiving the input dataset, file, based on reading the first configuration file and reading the second configuration file; applying the plurality of different data transformation tasks to the input dataset based upon the first configuration file to generate an output file including an output dataset that is formatted differently from the input dataset, wherein the output dataset is formatted according to the task-specific criteria and aligns with the plurality of columns as defined by the schema definition of the first configuration file; determining output access control permissions for the output dataset based on the access control list of the second configuration file. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification