BIOINFORMATICS 2026 Abstracts


Full Papers
Paper Nr: 83
Title:

MonoPredict-MI: Predicting Post-Myocardial Infarction Complications from Flow Cytometry–Derived Monocyte Subsets Using Neural Networks

Authors:

Nada Al-Dausari, Frans Coenen, Anh Nguyen and Eduard Shantsila

Abstract: Post–myocardial infarction complications remain a significant clinical challenge, underscoring the need for reliable tools that integrate immune profiling with predictive modelling. We developed MonoPredict-MI, a deep learning framework designed to link Flow Cytometry data with outcome prediction. The framework follows a three-stage design: (i) data preparation, where a FlowJo-based template is used for consistent gating and labelling of granulocytes,monocytes,and lymphocytes; (ii) feature engineering, combining domain-guided filtering with exploratory analysis to retain biologically meaningful markers; and (iii) a three-step neural network pipeline. In this pipeline, the first model distinguishes broad leukocyte groups, the second refines classification into monocyte subsets (Mon1–Mon4), and the third predicts post-MI complications using absolute monocyte counts. Evaluation on the FlowCyto-MI dataset (246 patients, >20 million cells) demonstrated robust performance: F1 ≈ 96% for broad leukocyte classification, ≈ 65% for monocyte subsets, and up to 41 for minority outcome detection. These results highlight the potential of reproducible cytometry–machine learning workflows for precision cardiovascular medicine.

Paper Nr: 115
Title:

Quantum Coherence Analytics with Molecular Dynamics to Identify Protein Receptors Allosteric Pockets and Repurposing Candidates

Authors:

Don Roosan, Mohmmad Masudur Rahman and Rubayat Khan

Abstract: G protein–coupled receptors GPCRs are major drug targets, yet achieving subtype selectivity is difficult when binding the highly conserved orthosteric site. Allosteric modulators interact with topographically distinct pockets, enabling greater specificity and functional modulation. However, only a few have reached the market due to cryptic pocket locations, sparse structural data, and challenges in detecting subtle modulatory effects. We present a hybrid computational pipeline that integrates molecular dynamics (MD), residue network analysis, and quantum algorithms to identify GPCR allosteric sites and potential modulators. Data from GPCRdb, the Protein Data Bank, and ~4,500 approved DrugBank compounds form the foundation. Residue interaction graphs were constructed from GPCR structures, and 200 ns MD simulations generated cross-correlation matrices. Quantum PageRank and continuous-time quantum walks identified communication hubs overlooked by classical metrics. Combining MD-correlated residues with quantum central nodes revealed two consensus allosteric sites in the M2 muscarinic receptor: an extracellular vestibule and a deep sodium ion pocket. Virtual docking against these sites ranked the known positive allosteric modulator LY2119620 highest, validating the approach, and identified approved drugs-such as diltiazem, and benztropine repurposing candidates. This quantum-enhanced framework offers a scalable strategy for allosteric GPCR drug discovery.

Paper Nr: 152
Title:

Kit-Specific and Other Non-Biological-Based Biases in Germline Whole-Exome Analysis at the Gene Level

Authors:

Laura Jarosz, Jiawei Dai, Marcel Ochocki, Julia Merta, Lajos Pusztai and Michał Marczyk

Abstract: Whole-exome sequencing (WES) studies are susceptible to non-biological variation that can confound downstream analyses, particularly when conducted across multiple sequencing centers. Those batch effects may originate from numerous sources, such as differential laboratory protocols or bioinformatics pipelines. Currently, there are no tools designed to directly address these issues in variant call data. We investigated germline WES data from 1,194 breast cancer patients across three cohorts, basing the analysis on aggregating the variant information at the gene level. Sample stratification has been explored with UMAP and supported by hypothesis testing. We discovered that the initial embeddings are dominated by population structure, which led to restricting the study to White individuals only (n=806). After controlling for ancestry, five groupings emerged, suggesting potential batch effects. Among multiple clinical and technical factors, the exome capture kit was the main driver of distinction. Gene-level testing revealed that many highly significant genes exhibited near-binary detection patterns when comparing groups enriched with different kits, consistent with ineffective target coverage rather than biology. We outline two potential mitigation strategies which may be used independently or in combination: joint genotyping to leverage cohort-wide evidence, and genotype imputation to fill capture gaps.

Paper Nr: 199
Title:

A Unified and Interpretable Framework for Evaluating Fluorescence Trace Quality in Transcription Kinetics

Authors:

Yiwen Xing, Wei-Tung Lu, Jinhao Liu, Zhiyong Zou, Rui Zhou, Hao Wang, Yusheng Yang, Yabing Yao, Qian Yang, Xiaomin Xu and Hongpeng Zhou

Abstract: Quantifying transcriptional dynamics from fluorescence traces is a powerful approach to understanding gene regulation, but such analysis critically depends on the quality of the fluorescence signal. Experimental researchers often lack an objective and computationally simple way to assess trace quality before kinetic modeling. In this study, we fill in this gap via systematically investigating two key factors (i.e., signal-to-noise ratio (SNR) and trace length) using synthetic data generated from a composite-state Hidden Markov Model (cpHMM) simulator. By analyzing thousands of simulated traces, we identified quantitative thresholds (SNR ≥ 30 dB and length ≥ 360) beyond which transcriptional dynamics can be reliably captured for kinetic inference. Building on these findings, we further discovered a unified and easily computable quality indicator based on the difference between the first two autocorrelation lags. A threshold value of approximately 0.07 effectively separates reliable from low quality traces, providing a simple yet robust criterion for data selection. Together, these results establish a practical framework for assessing fluorescence trace reliability, offering experimental researchers an interpretable and computationally efficient tool to ensure data quality prior to transcription kinetics modeling.

Paper Nr: 210
Title:

Statistical Inference and Probabilistic Model Checking on Uncertain Continuous-Time Markov Chains Representing Biochemical Pathways

Authors:

Hailey Sparks and Krishnendu Ghosh

Abstract: Biological systems are stochastic. The imprecision and incompleteness of the data from the experiments often lead to models with assumptions that are drastic simplifications of reality. Quantitative analysis using a continuous-time Markov chain of biological pathways are error-prone due to uncertainty in the rates of reaction. In this work, a novel formalism is created that addresses uncertainty in experimental data. The biological pathways are represented by an uncertain continuous-time Markov chain. A tractable model checking mechanism is proposed and the accuracy of its outcomes is statistically evaluated. Reasoning using continuous-time logics is performed on the formalism. Statistical inference on the queries posed on the formalism is evaluated. Statistical inference using Expectation-Maximization and Markov chain Monte Carlo algorithms are conducted on the formalism. As a case study, evaluation and analysis of the formalism is conducted on a prototype of the RKIP-inhibited ERK pathway. Results from the experiments and software are presented.

Paper Nr: 251
Title:

A Machine Learning Approach to Predict Biological Age and Its Longitudinal Drivers

Authors:

Nazira Dunbayeva, Yulong Li, Yutong Xie and Imran Razzak

Abstract: Predicting an individual’s aging trajectory is a central challenge in preventative medicine and bioinformatics. While machine learning models can predict chronological age from biomarkers, they often fail to capture the dynamic, longitudinal nature of the aging process. In this work, we developed and validated a machine learning pipeline to predict age using a longitudinal cohort with data from two distinct time periods (2019-2020 and 2021-2022). We demonstrate that a model using only static, cross-sectional biomarkers has limited predictive power when generalizing to future time points. However, by engineering novel features that explicitly capture the rate of change (slope) of key biomarkers over time, we significantly improved model performance. Our final LightGBM model, trained on the initial wave of data, successfully predicted age in the subsequent wave with high accuracy (R 2 = 0.515 for males, R 2 = 0.498 for females). SHAP analysis revealed that the engineered slope features were among the most impactful predictors, highlighting that an individual’s health trajectory, not simply their static health snapshot, drives biological age. This framework enables real-time tracking and early intervention for age-related risks.

Paper Nr: 257
Title:

Entropy and Stemness Index–Based Quantification of Epigenetic Dysregulation for Machine Learning–Driven Stratification of Representative Cancer Types

Authors:

Djansel Bukovec, Goran Kungulovski, Sonja Gievska and Monika Simjanoska Misheva

Abstract: Epigenetic dysregulation, quantified by DNA methylation entropy and the methylation-derived stemness index (mDNAsi), has been proposed as a marker of tumor aggressiveness; however, its prognostic relevance in established cancers remains unclear. It was examined whether epigenetic instability represents a hallmark of malignant transformation and whether its variation is associated with tumor outcomes. A three-state Markov model of unmethylated/intermediate/methylated (U–I–M) transitions was fitted to raw β values from 275 paired tumor–normal samples across four representative TCGA cohorts, and the overall adjacent switching rate (sany) was compared between tissues. Consistently higher local switching was observed in tumors than in matched normal tissues, and a strong correlation between genome-wide Shannon entropy and sany was detected. In 2,661 primary tumors, z-scored entropy and mDNAsi were evaluated in multivariable models of primary lymph-node positivity, fraction genome altered (FGA), and overall survival (OS), with adjustment for age and cancer type. After adjustment, independence association of entropy or mDNAsi with nodal involvement was identified (entropy OR per 1-SD ≈ 1.14, p = 0.466), and survival effects were found to be limited and cancer-type dependent. In linear models of FGA, entropy contributed little after covariate adjustment, whereas higher mDNAsi remained positively associated with increased copy-number burden. Mantel tests further indicated that epigenetic/genomic distance structures aligned with hypoxia but were largely uncoupled from nodal-status distances. These findings suggest that methylation entropy is unlikely to serve as a universal marker of aggressiveness in established tumors and may instead reflect evolutionary history and microenvironmental stress.

Paper Nr: 260
Title:

Modeling Cadmium Induced Metabolic Shifts in Human Pulmonary Cells

Authors:

Valentin Vigeant, Jean-Paul Comet, Gilles Bernot and Jean-Yves Trosset

Abstract: Non-genotoxic carcinogens (NGTxCs) induce cancer without directly altering the genetic material, making their mechanisms of action particularly challenging to predict. Cadmium, a well-known NGTxC classified as carcinogenic to humans, has been associated with several cancers, including lung cancer through chronic exposure. In this work, we investigate the carcinogenic effects of cadmium through metabolic reprogramming. We adapt a previously formally validated model of the metabolism regulation (Gibart et al., 2021c) in order to study cadmium exposure. It integrates major cadmium-induced perturbations such as oxidative stress, mitochondrial dysfunction, and glycolytic shift. Using the formal verification tool TotemBioNet, we systematically explore and filter all dynamics consistent with the global behaviors of healthy, cancerous, and apoptotic cells. They are investigated and classified into three subgroups. Our approach successfully reproduces characteristic features of carcinogenic metabolism, including fermentation activation and elevated ROS production, and demonstrates the relevance of discrete qualitative modeling for a systemic analysis of NGTxC-induced metabolic dysregulations.

Paper Nr: 395
Title:

Integrative Comparison of GeneHancer and Single-Cell Co-Accessibility Reveals Active Enhancer–Gene Interactions

Authors:

Lorenzo Martini, Roberta Bardini, Alessandro Savino and Stefano Di Carlo

Abstract: Linking enhancers to their target genes remains challenging due to the context-independent nature of curated annotations and the noise inherent in data-driven predictions. GeneHancer provides a comprehensive catalogue of enhancer–gene associations, but many elements are inactive in specific biological settings. Conversely, co-accessibility inferred from single-cell chromatin accessibility data captures sample-specific regulatory structure but may reflect indirect or non-functional interactions. This work integrates these complementary perspectives by comparing GeneHancer annotations with co-accessibility networks derived from a human PBMC Multiome dataset. Using Circe to infer peak–peak co-accessibility and GRAIGH to map peaks onto GeneHancer elements, this approach identifies enhancer–gene associations supported both by prior evidence and by accessibility patterns in the dataset. Only a small subset of GeneHancer links is validated by co-accessibility, yet these conserved associations display substantially higher cell-type specificity and stronger accessibility–expression concordance than either the full or “Elite” GeneHancer sets. This refined subset isolates regulatory interactions that are both biologically plausible and active in the sample, reducing redundancy and improving interpretability. Our results show that integrating curated enhancer annotations with single-cell epigenomic evidence yields a focused, high-confidence regulatory map suited for analyzing transcriptional regulation and cell identity in a dataset-specific manner.

Short Papers
Paper Nr: 107
Title:

User-Centred Wearables: Position-Independent Edge Computing for Flexible ADL Recognition

Authors:

Ekgari Kasawala, Antonio Fratini and Surej Mouli

Abstract: The growing demand for unobtrusive health monitoring highlights the requirement for wearable solutions that are both flexible and scalable. This study presents a position-independent Tiny-ML model for Activities of Daily Living (ADL) recognition utilising the Bangle.js −2 consumer-grade smartwatch, emphasising real-time and on-device classification. This approach exploits the ubiquity and adaptability of consumer-grade devices to enhance user engagement and comfort. Statistical analyses confirmed consistent performance across different body locations and activities. With accuracies exceeding 80% across all placements and a peak inference accuracy of 89.72%, this demonstrates the feasibility of deploying machine learning models directly on wearables, broadening their practical application and adoption. This research establishes the foundation for scalable, user-centred wearable systems for affordable continuous monitoring and long-term deployment.

Paper Nr: 110
Title:

Labeling the Unlabelable: A Pilot Study on Breast Density

Authors:

Simona Correra, Ida Maruotto, Giulia Varriano, Dalila De Lucia, Maria Chiara Brunese, Corrado Caiazzo, Antonella Santone, Paolo Gargiulo and Francesco Mercaldo

Abstract: In the medical domain, not all data can be reliably labeled for use as input to Machine Learning models, which creates limitations for the potential of supervised learning approaches. Breast density is a risk factor for breast cancer, and its assessment from MRI is subjective and could be inconsistent. The three-dimensional nature of MRI and the lack of clearly distinguishable objects in the images make breast density a perfect case study to explore the feasibility of generating labels directly from data. This study evaluates an alternative approach for radiomic analysis to generate data-driven labels. From a dataset of 136 breast MRI scans, 93 radiomic features were extracted per slice. K-means clustering was employed to obtain unsupervised labels, which were then used to train a Random Forest classifier. The model achieved high performances, particularly with First-Order features (accuracy: 0.89 ± 0.05, precision: 0.90 ± 0.07, specificity: 0.91 ± 0.06, sensitivity: 0.87 ± 0.09). These results demonstrate that clustering-based labeling can capture clinically meaningful patterns, offering a robust and objective alternative to manual annotation. The proposed method introduces a conceptual shift in medical AI, highlighting the feasibility of generating reliable labels directly from imaging data.

Paper Nr: 116
Title:

Quantum-Enhanced Multi-Omics Data Harmonization for Breast Cancer Subtype Classification

Authors:

Don Roosan, Mohmmad Masudur Rahman and Rubayat Khan

Abstract: Breast cancer exhibits marked molecular heterogeneity that complicates PAM50 subtype classification and limits interpretability in multi-omics models. We present QEMOLS, a quantum-ready multi-omics framework that casts feature selection as a quadratic unconstrained binary optimization (QUBO) objective balancing (i) per-feature relevance to subtype, (ii) pairwise redundancy penalties, and (iii) a soft cardinality constraint targeting a panel-sized signature. We optimize the QUBO using the Quantum Approximate Optimization Algorithm (QAOA) evaluated on a classical simulator, yielding an interpretable latent space whose axes correspond to selected molecular measurements. On a complete-case TCGA-BRCA cohort (~1,000 tumors; Luminal A, Luminal B, HER2-enriched, Basal-like), a multinomial logistic regression trained on ~30 selected features achieved micro-average AUROC 0.933 and macro-F1 0.726, matching overall accuracy of a 30-component PCA baseline while improving class-balanced metrics. Class-wise AUROCs were near-perfect for Basal-like and HER2-enriched and lower for Luminal A/B, with most errors confined to LumA↔LumB. The selected panel recapitulates known biology and supports counterfactual perturbation analyses to probe subtype stability. QEMOLS demonstrates that QUBO-based selection yields compact, biologically grounded representations suitable for interpretable, panel-level biomarkers.

Paper Nr: 159
Title:

Large-Scale Gene Regulatory Network Inference for in vivo Drosophila Gene Expression Data Using Genetic Algorithms

Authors:

Youchuan Wang, Yasir Ahmed-Braimah, Garrett Ethan Katz and Chilukuri K. Mohan

Abstract: Understanding gene regulatory networks is essential for uncovering disease mechanisms and guiding drug discovery, yet the analyses of large-scale in vivo datasets remain challenging due to noise and dimensionality. Existing approaches, including Boolean, Bayesian, and ODE-based models, often struggle to recover sparse and stable structures from real data. We present a genetic algorithm framework that integrates ODE modeling with evolutionary search, separating structure inference from parameter estimation and introducing a synthetic oracle elitism strategy for data without ground truth information. Evaluations on synthetic benchmarks and in vivo Drosophila expression data show that our method outperforms baseline algorithms, recovering high-confidence regulatory interactions.

Paper Nr: 220
Title:

Drug Repurposing for COVID-19

Authors:

Angela Kralevska, Ivana Vichentijevikj and Monika Simjanoska Misheva

Abstract: The COVID-19 pandemic has underscored the urgent need for rapid, adaptable drug discovery strategies capable of delivering viable therapeutic candidates in compressed time frames. Drug repurposing offers a cost-effective and time-efficient alternative to de novo drug development by identifying new indications for existing compounds. This study presents an integrated computational framework that combines deep learning–based drug–target interaction prediction, molecular docking, and large language model (LLM)–driven molecule generation for COVID-19 drug repurposing. Models from the DeepPurpose library were trained and fine-tuned on benchmark datasets (Davis and KIBA) and a COVID-19–specific high-throughput screening dataset for SARS-CoV-2 3CLPro, enabling large-scale virtual screening of candidate compounds. Molecular docking identified oleanolic acid and other compounds with strong predicted affinity to the SARS-CoV-2 main protease (3CL-Pro), while the DrugGen workflow generated sequence-conditioned SMILES and was able to recover known bioactive molecules such as Carfilzomib. These results show that combining deep learning–based screening, docking-based structural validation, and LLM-guided candidate generation can rapidly surface existing drugs with potential for repurposing. Rather than proposing new chemical matter, this work provides a reproducible in silico pipeline to prioritize repurposable compounds for follow-up in urgent infectious disease settings.

Paper Nr: 245
Title:

A Data Quality-Centric Approach for Predicting Radiology Report Delays

Authors:

Daniela Martins Silva, Pedro Fernandes, Daniel Madureira, Ana M. Freire, Hélder P. Oliveira and Jorge Araújo

Abstract: Radiology report delays can compromise clinical workflows, reduce care quality, and lead to resource ineffi-ciencies in hospital settings. Machine learning offers promising approaches to predict such delays and support the alignment between diagnostic report availability and follow-up appointment scheduling. However, hospital data is often incomplete, inconsistent, and noisy, which severely limits predictive model performance. This paper presents a data-centric pipeline to assess how preprocessing strategies, namely outlier detection and imputation, impact machine learning models for estimating radiology reporting delays. Feature selection was guided by clinical feedback obtained through a questionnaire with 32 valid physician responses, providing clinical relevance. Anonymized real-world hospital data were provided by ByMe, and underwent several preprocessing stages, including manual and automated cleaning. Results show that model performance improved significantly when categorical and numerical outliers were systematically treated. Among the tested models, XGBoost achieved the best results after full preprocessing. The findings suggest that improving data quality can substantially enhance predictive performance in this operational healthcare context.

Paper Nr: 305
Title:

Chromatin States Prediction and Comparison across BRCA Cell Lines Using Deep Learning Provide Insights for Cell State Transitions

Authors:

Xuejing Lyu, Jing Zhang, Rui Cao and Greg Tucker-Kellogg

Abstract: Understanding the chromatin landscape remodeled during cancer progression is essential to uncovering the epigenetic drivers of cell plasticity. The coordinated interplay of multiple histone modifications forms complex regulatory patterns that require powerful computational approaches to decode. Here, we present a deep learning framework for predicting cell line specific chromatin states using six histone modification marks including H3K4me3, H3K4me1, H3K27ac, H3K27me3, H3K36me3, and H3K9me3, profiled by CUT&Tag. This model utilizes a Gumbel-Softmax autoencoder to learn 18 discrete chromatin states, capturing canonical regulatory patterns from genome-wide signal data. Using a spectrum of breast cancer cell lines including MCF10A, MCF7, MCF10CA1a, MDA-MB-231 and DKAT, we compare the learned chromatin states composition across epithelial, stem-like and malignant contexts to characterize epigenetic landscapes associated with the cell state transitions. Our approach provides a generalizable, data-driven strategy for characterizing epigenetic reprogramming in cancer and build a foundation for future multi-omics integrative modeling for epigenetic regulation.

Paper Nr: 327
Title:

Leveraging Self-Attention for Heterogeneous Data Integration: A TabTransformer Validation on a Biomedical Case Study

Authors:

Kaiyan Shi and James Li

Abstract: We evaluate the TabTransformer model for integrating heterogeneous clinical and transcriptomic (RNA-Seq) data in a breast cancer risk stratification task. A unified analytical pipeline was developed using 5-fold stratified cross-validation and per-fold feature selection based on Chi-squared tests and DESeq2. The TabTransformer was compared against standard baseline models, including Lasso, Random Forest, and Logistic Regression, using dataset GSE164641 (N=187). The TabTransformer achieved the highest mean AUC-ROC and showed the largest performance gain when combining clinical and gene features. These results demonstrate its strength in modeling feature interactions and its potential for advancing multi-modal biomarker classification.

Paper Nr: 376
Title:

A Causal Framework for Interpretable Estimation of Treatment Effects in Heart Failure from Observational Data

Authors:

Carolina Carvalho, Ricardo Santos and Vânia Guimarães

Abstract: Integrating Artificial Intelligence techniques into clinical practice requires not only good predictive performance but also interpretability and alignment with medical reasoning. Beyond decision support systems, studying the causal effects of therapeutic interventions, such as medications, from observational data provides a scalable complement to randomized controlled trials. However, real-world clinical data, though becoming more accessible, pose significant challenges such as confounding and treatment assignment bias, which make causal conclusions difficult. Causal inference methods can aid in early-stage evaluation of potential interventions at scale. Nevertheless, these methods need a known causal graph, which is rarely available in complex clinical settings. In this work, we introduce a framework to evaluate in-hospital interventions in heart failure patients by estimating their effects on post-discharge mortality over multiple time horizons. Using a publicly available electronic health record dataset, we first apply ensemble-based causal discovery to infer a reliable causal graph and identify upstream treatment nodes that may influence mortality. We then estimate the causal effects of these interventions at both the population and individual levels using established inference techniques and graph-informed adjustment sets. Results from robustness tests confirm that among three widely studied medication groups, only beta-blockers exhibit robust and plausible causal effects on mortality, aligning with clinical evidence. While not a replacement for clinical validation, this framework provides a transparent and interpretable approach for identifying actionable interventions, supporting its future use in discovering new treatment options and validating or optimizing treatments for heart failure patients.

Paper Nr: 398
Title:

Using Machine Learning Approaches for Predicting Time of Death of Human Postmortem Samples Based on Transcriptomic Data

Authors:

Olivia Liau, Tillie Slosser, Ivan Betancourt, Qiaochu Liu, Chun-Kit Ngan, Chen Fu, Ryan W. Logan and Nitya Phani Santosh Oruganty

Abstract: In this work, we introduce a machine learning (ML) pipeline that predicts the time of death (TOD) of a subject from gene expression (GE) profiles, addressing a critical gap in genomic research where TOD data are scarce. Our contributions are fourfold: (1) a data-driven, clinically domain-guided pipeline that learns temporal GE patterns for TOD prediction; (2) a two-stage dimensionality reduction approach combining (I) AutoEncoders and (II) ISOMAP for BA11 & PCA for BA47, preserving the temporal sequence of the data while incorporating domain knowledge and obviating exhaustive searches for optimal circadian gene sequences; (3) systematic training and hyperparameter tuning of 16 ML regressors-including five single, eight ensemble, and three deep learning models-to identify the most effective ML model (i.e., ExtraTrees Regressor for BA11 and AdaBoost Regressor for BA47); and (4) a comprehensive evaluation on 146 subjects, examining 235 circadian gene’s expression patterns per subject across five performance metrics. Our method surpasses both non-temporal-encoding-based and temporal-encoding-based models, achieving hourly scaled MAEs of 0.839 (BA11) and 1.227 (BA47), with corresponding MSE and RMSE values of 1.013/1.006 and 2.153/1.467, respectively. Consequently, TOD predictions fall within a one hour error margin for BA11 and a two hour margin for BA47.

Paper Nr: 111
Title:

Biomotors Share the Same Mechanism of ATP Binding

Authors:

Liqiang Dai and Yao-Gen Shu

Abstract: This study presents our initial bioinformatics investigation of biomotors, employing amino acid sequence alignment and structural analysis of key motifs. Our findings reveal that the two rotary motors possess a similar configuration of the ATP catalytic pocket, whereas the three linear motors exhibit a distinct ATP binding site. This subtle structural difference likely explains the chemical irreversibility observed in linear motors, which are unable to retain phosphate groups. Despite the different types of binding sites observed, all five ATP catalytic pockets share similar structural conformations, suggesting a conserved ATP-binding mechanism across these biomotors. Furthermore, a key conclusion drawn is that the motility mode of biomotors correlates more strongly with their structural characteristics than with their amino acid sequence homology.

Paper Nr: 117
Title:

Quantum Tunneling in Adenine Deamination Using Variational Quantum Eigensolver

Authors:

Don Roosan, Rubayat Khan, Saif Nirzhor and Brian Provenchar

Abstract: We employ quantum computing techniques to simulate the electronic structure and reaction energetics of the adenine deamination process catalyzed by an engineered TadA8.20 base editor enzyme. The problem is formulated in terms of molecular electronic Hamiltonians for the reactant and product states, which are mapped onto qubits via the Jordan–Wigner transformation. This mapping encodes each electron orbital as a qubit degree of freedom, allowing quantum hardware to represent the system’s electronic state. We implement a hybrid quantum-classical Variational Quantum Eigensolver (VQE) algorithm on a 25-qubit IBM Qiskit simulator to compute the ground-state electronic energies of the reactant and product molecules. In the VQE approach, a parameterized quantum circuit (ansatz) is iteratively optimized through classical feedback to approximate each molecule’s ground-state wavefunction and minimal energy. The resulting energy difference between product and reactant (ΔE ≈ –0.018 Hartree) is negative, indicating an exergonic reaction. By accurately capturing the reaction’s energetics, this work provides an early demonstration of NISQ-era (noisy intermediate-scale quantum) simulation applied to an enzyme-catalyzed reaction. It underscores the feasibility and promising accuracy of near-term quantum computing for modeling complex biochemical transformations, marking a step toward practical quantum chemistry for biological systems. Although quantum tunneling is discussed in the context of TadA8.20 catalysis, in this work it is treated at a qualitative, mechanistic level because our VQE calculations target only the electronic energies of reduced reactant and product models rather than full kinetic barriers.

Paper Nr: 118
Title:

Quantum Enhanced Modeling of Enzyme Inhibition and Drug Behavior Integrating Tunneling Kinetics and Molecular Dynamics

Authors:

Don Roosan, Md Rahatul Ashakin and Rubayat Khan

Abstract: Enzyme‑catalyzed hydrogen‑transfer reactions can deviate from classical transition‑state theory due to quantum tunneling, often reflected experimentally as elevated kinetic isotope effects (KIEs). We present a proof‑of‑concept workflow that augments inhibitor potency prediction with tunneling‑related descriptors by combining (i) simplified quantum reaction‑coordinate models, (ii) molecular dynamics (MD)–derived flexibility measures, and (iii) classical molecular descriptors in a supervised machine‑learning (ML) model of inhibition constants (Ki). We evaluate two model enzymes with contrasting tunneling signatures: alcohol dehydrogenase (ADH) and dihydrofolate reductase (DHFR). Variational Quantum Eigensolver (VQE) calculations were executed using Qiskit on the statevector simulator (not noisy hardware) to extract tunneling‑proxy features from 1D double‑well models parameterized by donor–acceptor geometry. Across a curated dataset of enzyme–inhibitor pairs compiled from BRENDA, SABIO‑RK, DrugBank, and ChEMBL, adding tunneling features improved held‑out Ki prediction from R² ≈ 0.50 to ≈ 0.80 and reduced error from MAE ≈ 0.30 to ≈ 0.15 in log‑Ki units. MD simulations of human DHFR (50 ns, 300 K) supported a stable donor–acceptor geometry consistent with a well‑defined reactive configuration. While limited in scope, these results suggest that incorporating tunneling‑aware descriptors can improve Ki prediction accuracy for hydrogen‑transfer enzymes and motivate broader benchmarking on larger enzyme families and prospective inhibitor panels.

Paper Nr: 131
Title:

A logicGP Trainer for Classification of Biomedical Data

Authors:

Robin Nunkesser

Abstract: This paper presents logicGP-RLCW, a novel set of trainers for the logicGP framework designed to address the computational challenges associated with the original design and implementation. By restricting the hypothesis space, modifying the Genetic Programming algorithm, and introducing a two-phase final model selection procedure, logicGP-RLCW achieves significant improvements in computational efficiency while maintaining competitive predictive performance. Experiments on both simulated and real-world datasets demonstrate that logicGP-RLCW often outperforms existing methods, such as the algorithms contained in ML.NET’s AutoML. The results highlight the effectiveness of the proposed modifications.

Paper Nr: 207
Title:

Deep Learning Strategies for Molecular Structure Inference from Mass Spectra: A Comparative Study of Transformer Generation and Siamese Metric Learning

Authors:

Ivan Carrera, Pablo del Hierro, Fernando Cardenas, Dylan Villarroel, Ines Dutra and Eduardo Tejera

Abstract: Accurate inference of molecular structure from electron ionization mass spectra (EI-MS) remains a key challenge for biomedical analytics, where rule-based fragmentation and library matching struggle with novel or underrepresented compounds. This work presents a comparative evaluation of two distinct deep learning strategies: (i) a Transformer encoder-decoder for direct SMILES generation, fed a sparse sequential representation (Top-5 peaks), and (ii) a Siamese metric-learning model for nearest-neighbour retrieval, fed a rich spectral fingerprint (2000-bin vector). Using a curated CMFID subset (train/val/test: 80/10/10), we assess performance via SMILES validity, Tanimoto similarity, and retrieval accuracy. The Transformer achieves high syntactic validity (>95%) but low structural fidelity (mean Tanimoto ≈ 0.189). In contrast, the Siamese approach demonstrates near-perfect latent space retrieval (AUC ≈ 0.9947) and achieves a Top-1 structural fidelity (mean Tanimoto ≈ 0.351), nearly double that of the generative model. The results indicate that the retrieval-based strategy, when combined with a high-dimensional binned representation, currently offers superior structural accuracy. This suggests that the quality of the spectral representation is a dominant factor for success, while generative models struggle significantly when limited to sparse inputs. We outline a hybrid retrieval-generation pipeline that first narrows candidates via Siamese retrieval and then ranks/refines with a Transformer, combining recall and precision.

Paper Nr: 225
Title:

A Public Dataset of Visible Reflectance and Transmittance Spectra for Machine Learning in Colour Science

Authors:

Marcio Mello and Liliane Ventura

Abstract: Open and well-structured spectral datasets are essential for applying machine learning to colour science and optical metrology. To support reproducible research in these areas, we compiled an open collection of visible-range reflectance and transmittance spectra covering 380–780 nm at 5 nm intervals. The dataset, named MAKVEN, integrates reconstructed, synthetic, and physically measured spectra drawn from colour standards, natural materials, and hyperspectral imagery. All spectra were resampled, normalised, and annotated with metadata describing their origin and acquisition type. MAKVEN provides a compact yet comprehensive foundation for training and validation in data-driven models of colour and light–matter interaction. The dataset is publicly available at https://doi.org/10.6084/m9.figshare.30483962.

Paper Nr: 255
Title:

A Bioinformatics Methodological Approach for the Construction and Analysis of a bos indicus Pangenome

Authors:

Chiria Jorotiana, Jean-Claude Richard Rakotozafy, Aimé Richard Hajalalaina, Annie Chateau and Anne-Muriel Arigon

Abstract: Pangenome construction enables the exploration of genomic diversity beyond the limitations of a single reference genome, which can introduce bias in species with high genetic variability such as bos indicus. This study proposes a methodological pipeline integrating preprocessing, pangenome construction, and post-processing steps. Five chromosome-level genomes and three reference assemblies (bos indicus 1.0, bos taurus ARS-UCD1.2, and ARS-UCD2.0) were analyzed. Data preprocessing was performed using SeqKit and Samtools, while Minimap2 was employed to assess genome similarity among samples and references. The pangenome construction was conducted with Minigraph-Cactus, and presence/absence variants and variant visualization were carried out using Bcftools, UpSetPlot, and Venn diagrams. Results revealed that the choice of reference genome significantly influences variant distribution: pangenomes built with bos indicus 1.0 contained more shared variants, whereas those based on ARS-UCD2.0 exhibited higher total variant counts. The sample SRR17257200 showed intermediate results and may serve as an alternative reference when no suitable reference is available. Finally, an interactive pipeline is proposed to allow dynamic selection of the reference genome and enhance analytical flexibility for large-scale bos indicus pangenome studies. Supplementary materials are available on demand.

Paper Nr: 300
Title:

AMP-zGSM: A z-Scoring Enhanced Grouping--Scoring--Modeling Framework for Antimicrobial Peptide Prediction

Authors:

Demet Parlak Sönmez, Burcu Bakir-Gungor and Malik Yousef

Abstract: Peptides are potent antimicrobial agents that offer promising alternatives to conventional antibiotics in addressing the global challenge of antibiotic resistance. Owing to the remarkable diversity of antimicrobial peptides (AMPs) and recent advances in computational biology, the development of robust machine learning algorithms for AMP classification has become increasingly crucial. In this study, we introduce AMP-zGSM, a novel statistical feature-ranking model designed to enhance AMP classification. The AMP-zGSM framework ranks feature groups based on q-values derived from statistical z-scores. The model was trained on three large peptide datasets containing 3145, 12022, and 8346 peptides, respectively. Multiple machine learning algorithmsincluding Random Forest (RF), Support Vector Machine (SVM), Nave Bayes, XGBoost, AdaBoost, CatBoost, GradientBoost, and their ensemble variantswere trained and evaluated using feature subsets selected according to these q-values. Compared with models employing feature subsets obtained through traditional feature selection techniques (XGBoost, SelectKBest, XGB+SHAP, Information Gain Ratio, and mRMR), the AMP-zGSM model achieved comparable performance across all datasets. Notably, when compared with the three competing models, AMP-zGSM consistently demonstrated superior performance across the evaluated datasets, achieving the highest AUC values of 0.9737, 0.8846, and 0.97 on Datasets 1, 2, and 3, respectively. The datasets and AMP-zGSM model code developed in this study are publicly available at the following link: [https://github.com/DemetParlakSonmez/amp-zGSM].

Paper Nr: 333
Title:

A Matrix Factorization and Generative Modeling Framework for Drug–Target Interaction Prediction

Authors:

Vitor Magalhaes Silva, Khadidja Henni, Neila Mezghani and Youcef Abdelliche

Abstract: Predicting drug–target interactions (DTIs) is essential for accelerating drug discovery while reducing reliance on costly and time-consuming experimental screening. However, DTI datasets suffer from extreme class imbalance, sparse positive interactions, and heterogeneous biochemical representations, which severely limit the effectiveness of supervised learning. We propose an integrated framework combining Multiple Similarity Collaborative Matrix Factorization (MSCMF), Wasserstein GAN with Gradient Penalty (WGAN-GP), and a Deep Neural Network (DNN) to address these challenges. MSCMF produces similarity-aware representations of drugs and proteins by integrating multiple chemical and sequence similarity graphs, yielding coherent embedding spaces. WGAN-GP generates high-quality synthetic positive interactions directly in this latent space, correcting imbalance while preserving structural geometry. A DNN then predicts interaction probabilities using real and augmented embeddings. Experiments on the Yamanishi how that the proposed MSCMF WGAN-GP – DNN pipeline consistently improves recall, F1-score, ROC-AUC, and PR-AUC across all ligand families. Comparative and ablation analyses demonstrate that WGAN-GP outperforms VAE-based augmentation, and that MSCMF provides superior representations compared to other matrix factorization methods.