Abstract: |
Innovative pharma-genomics and personalized medicine services are now possible thanks to the availability for processing and analysis of a large amount of genomic data. Operating on such databases, is possible to test for predisposition to diseases by searching for genomic variants on whole genomes as well as on exomes, which are collections of protein coding regions called exons. Genomic data are therefore shared amongst research institutes, public/private operators, and third parties, creating issues of privacy, ethics, and data protection because genome data are strictly personal and identifying. To prevent damages that could follow a data breach—a likely threat nowadays—and to be compliant with current data protection regulations, genomic data files should be encrypted, and the data processing algorithms should be privacy-preserving. Such a migration is not always feasible: not all operations can be implemented straightforwardly to be privacy-preserving; a privacy-preserving version of an algorithm may not be as accurate for the purpose of biomedical analysis as the original; or the privacy-preserving version may not scale up when applied to genomic data processing because of inefficiency in computation time. In this work, we demonstrate that at least for a well- known genomic data procedure for the analysis of copy number variants called copy number variations (CNV) a privacy-preserving analysis is possible and feasible. Our algorithm relies on Homomorphic Encryption, a cryptographic technique to perform calculations directly on the encrypted data. We test our implementation for performance and reliability, giving evidence that it is practical to study copy number variations and preserve genomic data privacy. Our proof-of-concept application successfully and efficiently searches for a patient’s somatic copy number variation changes by comparing the patient gene coverage in the whole exome with a healthy control exome coverage. Since all the genomics data are securely encrypted, the data remain protected even if they are transmitted or shared via an insecure environment like a public cloud. Being this the first study for privacy-preserving copy number variation analysis, we demonstrate the potential of recent Homomorphic Encryption tools in genomic applications. |