Metadata-driven workflows and integration with genomic data processing systems and techniques
First Claim
1. A computer program product for driving genomic data processing workflows using metadata, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
- initiate, by the processor, a workflow configured to process data, wherein the workflow comprises one or more genomic analysis operations selected from the group consisting of;
base calling, variant calling, phylogenetic analysis, primer design, and amplicon design;
receiving, at the processor, a request to manage the workflow using metadata comprising;
anchoring metadata configured to uniquely identify the workflow by using an alphanumeric string;
common metadata comprising one or more characteristics selected from the group consisting of;
sample characteristics, processing site characteristics, laboratory characteristics, instrument characteristics, assay characteristics, temporal characteristics, security characteristics and project characteristics; and
custom metadata comprising workflow characteristics and/or data characteristics; and
associate, by the processor, the metadata with the genomic data;
drive, by the processor, at least a portion of the workflow based on the metadata, wherein driving the workflow based at least in part on the metadata comprises;
determining new data and/or at least one new processing setting to use in connection with repeating at least a portion of the workflow; and
repeating the portion of the workflow using the new data and/or the new processing setting, wherein the determining is based at least in part on the common metadata and/or the custom metadata; and
wherein the new data and/or the new processing setting comprise a modified number of permissible gaps in an alignment based at least in part on an average sequence length of input sequence data.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods and computer program products configured to provide and perform metadata-based workflow management are disclosed. The inventive subject matter includes a computer readable storage medium having computer readable program instructions embodied therewith. The computer readable program instructions are configured to: initiate a workflow configured to process data; associate the data with metadata; and drive at least a portion of the workflow based on at least some of the metadata. The metadata include anchoring metadata; common metadata; and custom metadata. Inventive subject matter also encompasses a method for managing genomic data processing workflows using metadata includes: initiating a workflow; receiving a request to manage the workflow using metadata comprising: anchoring metadata, common metadata, and custom metadata, associating the metadata with the data; and driving at least a portion of the workflow based on the metadata. The workflow involves genomic analyses.
-
Citations
16 Claims
-
1. A computer program product for driving genomic data processing workflows using metadata, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
initiate, by the processor, a workflow configured to process data, wherein the workflow comprises one or more genomic analysis operations selected from the group consisting of;
base calling, variant calling, phylogenetic analysis, primer design, and amplicon design;receiving, at the processor, a request to manage the workflow using metadata comprising; anchoring metadata configured to uniquely identify the workflow by using an alphanumeric string; common metadata comprising one or more characteristics selected from the group consisting of;
sample characteristics, processing site characteristics, laboratory characteristics, instrument characteristics, assay characteristics, temporal characteristics, security characteristics and project characteristics; andcustom metadata comprising workflow characteristics and/or data characteristics; and associate, by the processor, the metadata with the genomic data; drive, by the processor, at least a portion of the workflow based on the metadata, wherein driving the workflow based at least in part on the metadata comprises; determining new data and/or at least one new processing setting to use in connection with repeating at least a portion of the workflow; and repeating the portion of the workflow using the new data and/or the new processing setting, wherein the determining is based at least in part on the common metadata and/or the custom metadata; and wherein the new data and/or the new processing setting comprise a modified number of permissible gaps in an alignment based at least in part on an average sequence length of input sequence data. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method for managing genomic data processing workflows using metadata, the method comprising:
-
initiating a workflow, wherein the workflow comprises one or more genomic analysis operations selected from the group consisting of;
base calling, variant calling, phylogenetic analysis, primer design, and amplicon design;receiving a request to manage the workflow using metadata comprising; anchoring metadata, wherein the anchoring metadata uniquely identify the workflow by using an alphanumeric string; common metadata comprising one or more characteristics selected from the group consisting of;
sample characteristics, processing site characteristics, laboratory characteristics, instrument characteristics, assay characteristics, temporal characteristics, security characteristics and project characteristics; andcustom metadata comprising workflow characteristics and/or data characteristics; and
the method further comprising;associating the metadata with the genomic data; and driving at least a portion of the workflow based on the metadata, wherein driving the workflow based at least in part on the metadata comprises; determining new data and/or at least one new processing setting to use in connection with repeating at least a portion of the workflow; and repeating the portion of the workflow using the new data and/or the new processing setting, wherein the determining is based at least in part on the common metadata and/or the custom metadata; and wherein the new data and/or the new processing setting comprise a modified number of permissible gaps in an alignment based at least in part on an average sequence length of input sequence data. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A computer program product for driving genomic data processing workflows using metadata, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
receive, at the processor; workflow data; metadata associated with the workflow data, wherein the metadata comprise a plurality of metadata generations, each metadata generation corresponding to at least one operation of the workflow, each metadata generation including; anchoring metadata configured to uniquely identify the workflow by using an alphanumeric string; common metadata comprising one or more characteristics selected from;
sample characteristics, processing site characteristics, laboratory characteristics, instrument characteristics, assay characteristics, temporal characteristics, security characteristics and project characteristics; andcustom metadata comprising workflow characteristics and/or data characteristics; and a request to manage a workflow using the metadata; distribute, by the processor, the workflow data and the associated metadata across a plurality of distributed resources of a cloud computing environment; and associate the metadata with the workflow data by indexing, using the processor, the workflow data according to the metadata; and drive at least a portion of the workflow based on the metadata, wherein driving the workflow based at least in part on the metadata comprises; determining new data and/or at least one new processing setting to use in connection with repeating at least a portion of the workflow; and repeating the portion of the workflow using the new data and/or the new processing setting, wherein the determining is based at least in part on the common metadata and/or the custom metadata; and wherein the new data and/or the new processing setting comprise a modified number of permissible gaps in an alignment based at least in part on an average sequence length of input sequence data. - View Dependent Claims (12, 13, 14, 15, 16)
-
Specification