Methods and apparatus implementing data model for disease monitoring, characterization and investigation
First Claim
1. A method comprising:
- receiving metagenomics data;
configuring a data model characterizing relationships between different aspects of the metagenomics data; and
processing the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination;
the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome;
the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination;
wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data;
wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers; and
wherein the method is implemented by at least one processing device comprising a processor coupled to a memory.
7 Assignments
0 Petitions
Accused Products
Abstract
A method comprises receiving metagenomics data, configuring a data model characterizing relationships between aspects of the metagenomics data, and processing the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination. The data model comprises an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome, and a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to the disease, infection or contamination. The data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data. The metagenomics data may comprise metagenomics sequencing results from metagenomics sequencing centers associated with respective data zones.
185 Citations
20 Claims
-
1. A method comprising:
-
receiving metagenomics data; configuring a data model characterizing relationships between different aspects of the metagenomics data; and processing the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination; the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome; the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination; wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data; wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers; and wherein the method is implemented by at least one processing device comprising a processor coupled to a memory.
-
-
2. The method of claim 1 wherein an instance of the abundance score element of the data model is generated at a particular specified one of a plurality of hierarchical levels of granularity and wherein a lower one of the levels corresponds to an abundance score based on a genomic sequence of a particular pathogen and a higher one of the levels corresponds to an abundance score generated as a combination of multiple lower level abundance scores based on respective different genomic sequences of the ecogenome.
-
3. The method of claim 1 wherein an instance of the comparative score element of the data model is generated for two or more patients with respect to both a disease and an infection condition.
-
4. The method of claim 1 wherein the data model comprises a disease index element that relates to a corresponding disease and is a function of abundance scores of a plurality of biological samples related to the disease and comparative scores of a plurality of patients related to the disease.
-
5. The method of claim 4 wherein the data model further comprises a Big Data score element that relates to the corresponding disease and is a function of non-genomic data associated with the disease.
-
6. The method of claim 1 wherein the data model comprises a comparative index element that relates to a corresponding patient and is a function of abundance scores of a plurality of biological samples related to the patient and comparative scores between the patient and one or more other patients.
-
7. The method of claim 6 wherein the data model further comprises a Big Data score element that relates to the corresponding patient and is a function of non-genomic data associated with the patient.
-
8. The method of claim 1 wherein the data model further comprises a roles element related to the ecogenome that specifies one or more roles of the ecogenome.
-
9. The method of claim 1 wherein the data model further comprises an occurrence statistics element that relates a given one of the reads to the genomic sequence.
-
10. The method of claim 1 wherein the data model is configured to associate a given one of the reads with different occurrence statistics elements for respective different ones of a plurality of genomic sequences.
-
11. The method of claim 1 wherein the data model is configured to associate each of a plurality of the reads with respective different occurrence statistics elements for respective different ones of a plurality of genomic sequences.
-
12. The method of claim 1 wherein the data model comprises host and agent elements configured to characterize the ecogenome as having respective ones of a host role for a disease or an agent role for the disease.
-
13. The method of claim 1 wherein each of the metagenomics sequencing centers is associated with a corresponding YARN cluster of a multi-cluster distributed data processing platform.
-
14. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
-
to receive metagenomics data to configure a data model characterizing relationships between different aspects of the metagenomics data; and to process the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination; the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome; the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination; wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data; and wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers.
-
-
15. The computer program product of claim 14 wherein the data model comprises a disease index element that relates to a corresponding disease and is a function of abundance scores of a plurality of biological samples related to the disease and comparative scores of a plurality of patients related to the disease.
-
16. The computer program product of claim 14 wherein the data model comprises a comparative index element that relates to a corresponding patient and is a function of abundance scores of a plurality of biological samples related to the patient and comparative scores between the patient and one or more other patients.
-
17. An apparatus comprising:
-
at least one processing device having a processor coupled to a memory; wherein said at least one processing device is configured; to receive metagenomics data to configure a data model characterizing relationships between different aspects of the metagenomics data; and to process the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination; the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome; the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination; wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data; and wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers.
-
-
18. The apparatus of claim 17 wherein the data model comprises a disease index element that relates to a corresponding disease and is a function of abundance scores of a plurality of biological samples related to the disease and comparative scores of a plurality of patients related to the disease.
-
19. The apparatus of claim 17 wherein the data model comprises a comparative index element that relates to a corresponding patient and is a function of abundance scores of a plurality of biological samples related to the patient and comparative scores between the patient and one or more other patients.
-
20. The apparatus of claim 17 wherein each of the metagenomics sequencing centers is associated with a corresponding YARN cluster of a multi-cluster distributed data processing platform.
Specification