Coincidence detection method, products and apparatus
First Claim
1. A coincidence detection method for use with a data set of objects, said objects having a number of attributes, the method comprising the steps of:
- sampling various subsets of the data set for a plurality of iterations, each iteration the sampled subset of the data set having for each object the same subset of attributes;
detecting, and recording counts of, coincidences in each sampled subset of the data set, a coincidence being the co-occurrence of a plurality of attribute values in one or more objects in a sampled subset of the data set, where the plurality of attribute values is the same for each occurrence, the detecting and recording counts of coincidences in each sampled subset of the data set being performed before, at the same time or after sampling, detecting and recording counts of coincidences in other subsets;
determining an expected count for each coincidence of interest that has been detected in the previous step;
comparing, for each coincidence of interest, the observed count of coincidences versus the expected count of coincidences, and from this comparison determining a measure of correlation for the plurality of attributes for the coincidence; and
reporting a set of k-tuples of correlated attributes, where a k-tuple of correlated attributes is a plurality of attributes for which the measure of correlation is above a respective pre-determined threshold.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for detecting coincidences in a data set of objects, where each object has a number of attributes. Iteratively, equally-sized subsets of the data set of sampled, and coincidences (co-occurrences of a plurality of attribute values in one or more objects in the subset) are recorded. For each coincidence of interest, the expected coincidence count is determined and compared with the observed coincidence count; this comparison is used to determine a measure of correlation for the plurality of attributes for the coincidence. The resulting set of k-tuples of correlated attributes is reported, a k-tuple of correlated attributes being a plurality of attributes for which the measure of correlation is above a predetermined threshold. The method and system (implemented on an array of processing nodes) is suitable for protein structure analysis, e.g. in HIV research.
234 Citations
39 Claims
-
1. A coincidence detection method for use with a data set of objects, said objects having a number of attributes, the method comprising the steps of:
-
sampling various subsets of the data set for a plurality of iterations, each iteration the sampled subset of the data set having for each object the same subset of attributes;
detecting, and recording counts of, coincidences in each sampled subset of the data set, a coincidence being the co-occurrence of a plurality of attribute values in one or more objects in a sampled subset of the data set, where the plurality of attribute values is the same for each occurrence, the detecting and recording counts of coincidences in each sampled subset of the data set being performed before, at the same time or after sampling, detecting and recording counts of coincidences in other subsets;
determining an expected count for each coincidence of interest that has been detected in the previous step;
comparing, for each coincidence of interest, the observed count of coincidences versus the expected count of coincidences, and from this comparison determining a measure of correlation for the plurality of attributes for the coincidence; and
reporting a set of k-tuples of correlated attributes, where a k-tuple of correlated attributes is a plurality of attributes for which the measure of correlation is above a respective pre-determined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
0. begin 1. read (MATRIX);
2. read (R, T);
3. compute_first_order_marginals(MATRIX);
4. csets;
={};
5. for iter=1 to T do 6. sampled_rows;
=rsample(R,MATRIX);
7. attributes;
=get _attributes(sampled_rows);
8. all_coincidences;
=find_all _coincidences(attributes);
9. for coincidence in all_coincidences do 10. if cset_already_exists(coincidence, csets);
11. then update_cset(coincidence, csets);
12. else add_new_cset(coincidence, csets);
13. endif 14. endfor 15. endfor 16. for cset in csets do 17. expected;
=compute_expected_match_count(cset);
18. observed;
=get_observed_match_count(cset);
19. stats;
=update_stats(cset, hypoth_test(expected, observed));
20. endfor 21. print_final_stats(csets, stats);
22. end.
-
-
5. The coincidence method of claim 1, further comprising the step of representing the objects and attributes in a matrix of objects versus attributes prior to sampling the data set, the data set being sampled by sampling the matrix.
-
6. A method comprising:
-
the method of claim 1, and the further step of; applying rules that are defined by the reported correlated attributes.
-
-
7. The method of claim 1, further comprising the steps of first creating a database of transitions between system states, wherein a system state is represented by a value of a state variable, over a chosen time quantum, and presenting the database, in whole or part, as a data set such that each state to state transition set corresponds to one of the objects and so that each state variable corresponds to an attribute.
-
8. The method of claim 1, further comprising the steps of first creating a database of states and actions covering a chosen time quantum and presenting the database in whole or part, as a data set such that each state/action/state triple corresponds to one of the objects and so that each state variable or action type corresponds to an attribute.
-
9. The method of claim 1, wherein at least one of the objects corresponds to a biological sample from a subject and at least one of the attributes corresponds to a biological parameter of genes or gene products.
-
10. The method of claim 9, wherein at least one of the attributes corresponds to a phenotypic aspect.
-
11. The method of claim 9, wherein at least one of the attributes corresponds to expression of a gene.
-
12. The method of claim 11, wherein the expression of at least one gene is measured by mRNA.
-
13. The method of claim 11, wherein the expression of at least one gene is measured by protein product.
-
14. The method of claim 9, wherein at least some of the objects correspond to biological samples from a single subject collected over time and at least one of the attributes corresponds to expression of a gene.
-
15. The method of claim 14, wherein the expression of at least one gene is measured by mRNA.
-
16. The method of claim 14, wherein the expression of at least one gene is measured by protein product.
-
17. The method of claim 1, wherein said plurality of iterations is a predetermined number of iterations.
-
18. A coincidence detection method for use with a data set of objects, each of the objects having at least one attribute, the method comprising the steps of:
-
(1) sampling various subsets of the data set for a plurality of iterations, each iteration the sampled subset of the data set having for each object the same subset of attributes;
(2) detecting attribute coincidences based on results of sampling of the data set;
(3) recording attribute coincidences; and
(4) comparing at least one recorded attribute coincidence count to at least one expected attribute coincidence count, wherein the expected attribute coincidence count is determined for a coincidence that has been detected in the preceding steps. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
(5) reporting attribute coincidences based on result of said comparison.
-
-
20. The method of claim 19, the method further comprising the step of applying rules that are defined by the reported attribute coincidences.
-
21. The method of claim 19, wherein step (5) comprises the step of reporting at least one numerical correlation value and at least one k-tuple of correlated attributes.
-
22. The method of claim 18, wherein the objects and the attributes of the data set are represented in a matrix.
-
23. The method of claim 18, the method further comprising the step of separating the data set into subsets for sampling.
-
24. The method of claim 18, wherein more than one subset of objects is sampled and the size of the subsets of objects sampled is a constant.
-
25. The method of claim 18, wherein the comparison of the recorded attribute coincidences to said expected count is done using a Chernoff bound on tail probabilities.
-
26. The method of claim 18, wherein step (3) comprises the step of storing a running total of counts of each attribute coincidence detected over all the subsets sampled.
-
27. The method of claim 18, wherein at least one of the objects corresponds to a biological sample from a subject and at least one of the attributes corresponds to a biological parameter of a gene or gene product.
-
28. The method of claim 27, wherein at least one of the attributes corresponds to a phenotypic aspect.
-
29. The method of claim 27, wherein at least one of the attributes corresponds to expression of a gene.
-
30. The method of claim 29, wherein the expression of at least one gene is measured by mRNA.
-
31. The method of claim 29, wherein the expression of at least one gene is measured by protein product.
-
32. The method of claim 27, wherein at least some of the objects correspond to biological samples from a single subject collected over time and at least one of the attributes corresponds to expression of a gene.
-
33. The method of claim 32, wherein the expression of at least one gene is measured by mRNA.
-
34. The method of claim 32, wherein the expression of at least one gene is measured by protein product.
-
35. The method of claim 18, wherein at least one of the objects corresponds to a subject and at least some of the attributes correspond to the subjects'"'"' genes or gene expression patterns and the presence of a particular drug side-effect or side-effects after having been administered a particular drug.
-
36. The method of claim 18, wherein at least one of the objects corresponds to a subject and at least some of the attributes correspond to the subjects'"'"' genes or gene expression patterns and the presence of a particular drug side-effect or side-effects after having been administered a particular combination of drugs.
-
37. The method of claim 18, wherein at least one of the objects corresponds to a subject and at least some of the attributes correspond to the subjects'"'"' genes or gene expression patterns and the response of the subject to treatment using a particular drug.
-
38. The method of claim 18, wherein at least one of the objects corresponds to a subject and at least some of the attributes correspond to the subjects'"'"' genes or gene expression patterns and the response of the subject to treatment using a particular combination of drugs.
-
39. The method of claim 18, wherein said plurality of iterations is a predetermined number of iterations.
Specification