Abstract: |
Heart failure (HF) and chronic obstructive pulmonary disease (COPD) are chronic conditions that significantly affect the general population, requiring early detection of decompensation or exacerbation to preserve individual health and mitigate disruptions to daily life (Boult et al., 1996). Machine learning (ML) models offer a promising alternative for early symptom detection but demand extensive training data, a critical challenge in the medical domain (Gálvez-Barrón et al., 2023). The collection of such data is labor-intensive, requiring efforts from patients and healthcare institutions. To address this challenge, Deep Learning (DL) generative models have emerged as a solution, synthesizing data with similar statistical properties to original datasets and creating artificial datasets that can be used for downstream tasks such as Machine Learning inference. Synthetic data not only aids in overcoming data scarcity but also addresses privacy concerns by retaining statistical properties while preventing individual identification (Hernandez et. al, 2022), assuring secure data sharing across clinical institutions.
In this work, we investigate the potential of DL generative models to synthesize tabular data tailored to the specified conditions, focusing on generating privacy-preserving data to enable the early detection of exacerbation or decompensation phases. We consider two known families of DL generative models, specifically adapted versions of a Variational Autoencoder (VAE) (Fu et al., 2019) and a Generative Adversarial Network (GAN) (Arjovsky et al., 2017). Baseline approaches such as SMOTE (Chawla et al., 2011), and Probabilistic Sampling (PS) are considered for comparison.
A small-sized dataset (with 252 samples) containing demographics and signal features from oximetry (SpO2) and heart rate (HR) measures collected from HF and COPD patients (Gálvez-Barrón et al., 2023) is chosen for evaluation. The prediction task targets the detection of decompensated heart failure (exacerbated), or compensated heart failure (stable) phases based on the signals’ traits. The synthetic data produced is evaluated using three families of metrics: 1) Statistical Fidelity (Kynkäänniemi et al., 2019) 2) Privacy-Preserving (Liu et al., 2024), and 3) ML Utility using a recently released data auditing library, pyMDMA (https://github.com/fraunhoferportugal/pymdma).
Results show that the GAN and VAE architectures achieve a reasonable privacy-fidelity tradeoff while maintaining a stable ML utility. Specifically, the GAN achieved privacy and fidelity scores of 91.2% and 87.0%, respectively, while the VAE was the top performer, with privacy and fidelity scores of 95.5% and 93.0%. Comparisons with baselines show that while SMOTE achieved a fidelity of 100%, the data generated from this model resulted in a low privacy score of 5.5%. In contrast, while PS retained a high privacy score of 99.7%, the fidelity of the generated data w.r.t. the real one is relatively low (27%). In terms of ML Utility, SMOTE was the top performer, followed by the DL methods.
To conclude, these findings highlight the potential of the GAN and VAE architectures as robust models for generating synthetic clinical data that balance privacy, fidelity, and utility, promoting efforts for safer and more effective use of synthetic data in healthcare applications. |