Methods and apparatus implementing data model for disease monitoring, characterization and investigation

US 10,528,875 B1
Filed: 09/30/2016
Issued: 01/07/2020
Est. Priority Date: 04/06/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving metagenomics data;

configuring a data model characterizing relationships between different aspects of the metagenomics data; and

processing the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination;

the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome;

the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination;

wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data;

wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers; and

wherein the method is implemented by at least one processing device comprising a processor coupled to a memory.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method comprises receiving metagenomics data, configuring a data model characterizing relationships between aspects of the metagenomics data, and processing the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination. The data model comprises an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome, and a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to the disease, infection or contamination. The data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data. The metagenomics data may comprise metagenomics sequencing results from metagenomics sequencing centers associated with respective data zones.

185 Citations

20 Claims

1. A method comprising:
- receiving metagenomics data;
  
  configuring a data model characterizing relationships between different aspects of the metagenomics data; and
  
  processing the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination;
  
  the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome;
  
  the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination;
  
  wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data;
  
  wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers; and
  
  wherein the method is implemented by at least one processing device comprising a processor coupled to a memory.

2. The method of claim 1 wherein an instance of the abundance score element of the data model is generated at a particular specified one of a plurality of hierarchical levels of granularity and wherein a lower one of the levels corresponds to an abundance score based on a genomic sequence of a particular pathogen and a higher one of the levels corresponds to an abundance score generated as a combination of multiple lower level abundance scores based on respective different genomic sequences of the ecogenome.

3. The method of claim 1 wherein an instance of the comparative score element of the data model is generated for two or more patients with respect to both a disease and an infection condition.

4. The method of claim 1 wherein the data model comprises a disease index element that relates to a corresponding disease and is a function of abundance scores of a plurality of biological samples related to the disease and comparative scores of a plurality of patients related to the disease.

5. The method of claim 4 wherein the data model further comprises a Big Data score element that relates to the corresponding disease and is a function of non-genomic data associated with the disease.

6. The method of claim 1 wherein the data model comprises a comparative index element that relates to a corresponding patient and is a function of abundance scores of a plurality of biological samples related to the patient and comparative scores between the patient and one or more other patients.

7. The method of claim 6 wherein the data model further comprises a Big Data score element that relates to the corresponding patient and is a function of non-genomic data associated with the patient.

8. The method of claim 1 wherein the data model further comprises a roles element related to the ecogenome that specifies one or more roles of the ecogenome.

9. The method of claim 1 wherein the data model further comprises an occurrence statistics element that relates a given one of the reads to the genomic sequence.

10. The method of claim 1 wherein the data model is configured to associate a given one of the reads with different occurrence statistics elements for respective different ones of a plurality of genomic sequences.

11. The method of claim 1 wherein the data model is configured to associate each of a plurality of the reads with respective different occurrence statistics elements for respective different ones of a plurality of genomic sequences.

12. The method of claim 1 wherein the data model comprises host and agent elements configured to characterize the ecogenome as having respective ones of a host role for a disease or an agent role for the disease.

13. The method of claim 1 wherein each of the metagenomics sequencing centers is associated with a corresponding YARN cluster of a multi-cluster distributed data processing platform.

14. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device:
- to receive metagenomics data to configure a data model characterizing relationships between different aspects of the metagenomics data; and
  
  to process the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination;
  
  the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome;
  
  the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination;
  
  wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data; and
  
  wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers.

15. The computer program product of claim 14 wherein the data model comprises a disease index element that relates to a corresponding disease and is a function of abundance scores of a plurality of biological samples related to the disease and comparative scores of a plurality of patients related to the disease.

16. The computer program product of claim 14 wherein the data model comprises a comparative index element that relates to a corresponding patient and is a function of abundance scores of a plurality of biological samples related to the patient and comparative scores between the patient and one or more other patients.

17. An apparatus comprising:
- at least one processing device having a processor coupled to a memory;
  
  wherein said at least one processing device is configured;
  
  to receive metagenomics data to configure a data model characterizing relationships between different aspects of the metagenomics data; and
  
  to process the metagenomics data in accordance with the configured data model in order to characterize at least one of a disease, infection or contamination;
  
  the data model comprising an abundance score element that relates portions of the metagenomics data comprising reads of biological samples to one or more genomic sequences of an ecogenome;
  
  the data model further comprising a comparative score element that relates portions of the metagenomics data comprising characteristics of multiple patients to one another with respect to said disease, infection or contamination;
  
  wherein the data model further relates the abundance score element to the comparative score element via one or more additional elements of the data model corresponding to respective other aspects of the metagenomics data; and
  
  wherein the metagenomics data comprises metagenomics sequencing results from a plurality of metagenomics sequencing centers associated with respective data zones and wherein the disease, infection or contamination is characterized by the data model as involving genomic material from multiple ones of a plurality of biological samples sequenced in different ones of the data zones by corresponding different ones of the metagenomics sequencing centers.

18. The apparatus of claim 17 wherein the data model comprises a disease index element that relates to a corresponding disease and is a function of abundance scores of a plurality of biological samples related to the disease and comparative scores of a plurality of patients related to the disease.

19. The apparatus of claim 17 wherein the data model comprises a comparative index element that relates to a corresponding patient and is a function of abundance scores of a plurality of biological samples related to the patient and comparative scores between the patient and one or more other patients.

20. The apparatus of claim 17 wherein each of the metagenomics sequencing centers is associated with a corresponding YARN cluster of a multi-cluster distributed data processing platform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Original Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Inventors
Florissi, Patricia Gomes Soares, Ukelson, Michal Ziv, Dach, Ran, Benshahar, Arnon
Primary Examiner(s)
Negin, Russell S

Application Number

US15/281,248
Time in Patent Office

1,194 Days
Field of Search

None
US Class Current
CPC Class Codes

G06N 20/00   Machine learning

G06N 5/04   Inference or reasoning models

G16B 20/00   ICT specially adapted for f...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

G16B 5/00   ICT specially adapted for m...

Methods and apparatus implementing data model for disease monitoring, characterization and investigation

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

185 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus implementing data model for disease monitoring, characterization and investigation

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

185 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links