Watermarking for data security in bioinformatic sequence analysis
First Claim
Patent Images
1. A system for modifying a deoxyribonucleic acid (DNA) sequence corresponding to a genome or portion thereof and represented as a reference graph, the system comprising:
- a memory partition for storing the sequence; and
a watermarking module for modifying the sequence by introducing a watermarking artifact therein, the watermarking artifact comprising at least one of (a) a plurality of variants not found in natural genomic DNA, (b) a variant introduced in a repeat sequence other than the first of a plurality of repeat sequences in a repetitive region, (c) at least one sequence no longer than 30 bp not found in natural genomic DNA, or (d) metadata associated with variants in the reference graph, wherein the graph includes multiple paths at least one of which corresponds to a natural DNA sequence and another of which includes the watermarking artifact.
5 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention protect information stored in graph-based sequence references by “watermarking” the graph with uniquely identifiable information. The watermark identifies the graph or version thereof in a detectable but nonintrusive manner. In one embodiment, insertions and/or deletions are introduced into regions of the graph.
89 Citations
17 Claims
-
1. A system for modifying a deoxyribonucleic acid (DNA) sequence corresponding to a genome or portion thereof and represented as a reference graph, the system comprising:
-
a memory partition for storing the sequence; and a watermarking module for modifying the sequence by introducing a watermarking artifact therein, the watermarking artifact comprising at least one of (a) a plurality of variants not found in natural genomic DNA, (b) a variant introduced in a repeat sequence other than the first of a plurality of repeat sequences in a repetitive region, (c) at least one sequence no longer than 30 bp not found in natural genomic DNA, or (d) metadata associated with variants in the reference graph, wherein the graph includes multiple paths at least one of which corresponds to a natural DNA sequence and another of which includes the watermarking artifact. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
- 11. A method of watermarking a deoxyribonucleic acid (DNA) sequence corresponding to a genome or portion thereof and represented as a reference graph and stored as a data structure in a computer memory, the method comprising modifying the memory contents corresponding to the sequence by introducing a watermarking artifact therein, the watermarking artifact comprising at least one of (a) a plurality of variants not found in natural genomic DNA, (b) a variant introduced in a repeat sequence other than the first of a plurality of repeat sequences in a repetitive region, (c) at least one sequence no longer than 30 bp not found in natural genomic DNA, or (d) metadata associated with variants in the reference graph, wherein the graph includes multiple paths at least one of which corresponds to a natural DNA sequence and another of which includes the watermarking artifact.
Specification