TRUSTroke 2026 Abstracts


Area 1 - TRUSTroke

Full Papers
Paper Nr: 6
Title:

How Interpretable Are LLMs? A Multi-Metric Framework for Evaluating Synthetic Explanations in Digital Mental Health

Authors:

Prasan Yapa, Ashala Senanayake and Zilu Liang

Abstract: Large language models (LLMs) have shown promise in digital mental health (DMH) screening through in-context learning (ICL), yet the interpretability of their predictions remain insufficiently understood. While prompt-based synthetic explanations offer potential for understanding model decisions, their quality and clinical utility in DMH contexts lack rigorous evaluation. To address this gap, we propose a novel human-centered interpretability framework that engages certified psychologists and psychiatrists to systematically evaluate free-text explanations generated by LLMs for depression and anxiety screening. Our framework consists of human-grounded metrics for plausibility and informativeness derived from expert assessment of consistency, reliability, and professionalism, while reformulating faithfulness through predictive power evaluation to measure how well explanations predict model decisions independent of gold labels. Human evaluations show that Gemma-7B produces highly plausible and informative explanations (plausibility: 0.47, informativeness: 0.43), while automated analyses demonstrate that faithful explanations substantially enhance downstream performance. Notably, explanations integrated into few-shot examples enable MentaLLaMA to achieve F1-score improvements of 8.80% and 7.84% for depression and anxiety classification on the SMHD corpus. This work provides a principled foundation for evaluating and leveraging LLM-generated explanations in DMH applications.
Download

Paper Nr: 7
Title:

Towards Federated Breast Microcalcification Analysis: Validating Segmentation Stability on Kaapana Using Synthetic Phantoms

Authors:

Shqipe Salii, Mennan Selimi and Markus Graf

Abstract: Breast microcalcifications are among the earliest radiographic signs of ductal carcinoma in situ, and their morphology and spatial distribution play a critical role in malignancy assessment. Developing robust AI models for microcalcification segmentation, detection, and characterization is challenged by the strict data governance constraints that prevent multi-institution aggregation of mammography data. Federated Learning provides a promising alternative by enabling collaborative training without sharing raw images. This study evaluates the training stability of Kaapana’s federated nnU-Net segmentation workflow under realistic multi-site conditions as a methodological foundation for future microcalcification analysis. Using a controlled synthetic phantom dataset with lesion-like geometric structures of increasing boundary complexity, we conducted a two-site federated experiment across heterogeneous hardware environments. Federated segmentation remained stable across sites, achieving Dice scores above 0.98 for well-performing targets, while lower-performing targets exhibited median Dice scores of approximately 0.38. These findings provide the necessary groundwork for deploying federated microcalcification segmentation pipelines, where high sensitivity, boundary precision, and cross-site reproducibility is essential for diagnostic applications targeting ductal carcinoma in situ.
Download

Paper Nr: 9
Title:

Explainable AI for Post-Stroke Recurrence Prediction: A Random Forest Dashboard for Personalized Risk Profiles

Authors:

Luis Marte, Oier Segura, Judith Recober, Laura Rivera-Sanchez, Carlos A. Molina and Carolina Migliorelli

Abstract: Recurrent stroke remains a major threat in the first year following an initial ischemic event, often leading to worse outcomes than the primary stroke. Accurate prediction of recurrence is critical but rarely actionable, especially when clinical models function as black boxes. We present an explainable machine learning (ML) dashboard using a random forest classifier trained on longitudinal clinical data from 4,745 ischemic stroke patients to predict 1-year recurrence risk. The model achieved good discrimination (AUC = 0.82) and integrated SHapley Additive exPlanations (SHAP) to offer both global and individualized feature attributions. We show how patients with identical predicted risks can have vastly different contributing factors, and provide sensitivity analyses simulating how changes in modifiable variables (e.g., blood pressure, HDL, BMI) affect predictions. These personalized explanations allow clinicians to prioritize interventions based on the most impactful risk factors for each patient. Our dashboard addresses the transparency gap in AI-driven care and enables more precise, patient-centered secondary prevention. The model complies with ethical and data privacy standards, using pseudonymized data and federated learning across centers. This work demonstrates how interpretable AI can turn predictive analytics into actionable tools for real-world stroke care.
Download

Paper Nr: 10
Title:

Federated Transformer-GNN for Privacy-Preserving Brain Tumor Localization with Modality-Level Explainability

Authors:

Andrea Protani, Riccardo Taiello, Marc Molina Van Den Bosch and Luigi Serio

Abstract: Deep learning models for brain tumor analysis require large and diverse datasets that are often siloed across healthcare institutions due to privacy regulations. We present a federated learning framework for brain tumor localization that enables multi-institutional collaboration without sharing sensitive patient data. Our method extends a hybrid Transformer-Graph Neural Network architecture derived from prior decoder-free supervoxel GNNs and is deployed within CAFEIN®, CERN’s federated learning platform designed for healthcare environments. We provide an explainability analysis through Transformer attention mechanisms that reveals which MRI modalities drive the model predictions. Experiments on the BraTS dataset demonstrate a key finding: while isolated training on individual client data triggers early stopping well before reaching full training capacity, federated learning enables continued model improvement by leveraging distributed data, ultimately matching centralized performance. This result provides strong justification for federated learning when dealing with complex tasks and high-dimensional input data, as aggregating knowledge from multiple institutions significantly benefits the learning process. Our explainability analysis, validated through rigorous statistical testing on the full test set (paired t-tests with Bonferroni correction), reveals that deeper network layers significantly increase attention to T2 and FLAIR modalities (p < 0.001, Cohen’s d=1.50), aligning with clinical practice.
Download

Paper Nr: 11
Title:

A Study on the Security Risks of Model Sharing in Federated Learning Systems

Authors:

Francesco Pastore, Marco Di Gennaro, Stefano Zanero and Michele Carminati

Abstract: Federated Learning (FL) frameworks enable multiple clients to collaboratively train a Machine Learning (ML) model without requiring data to leave client devices, supporting applications in which privacy and data security are critical, such as healthcare and finance. In these systems, one of the first steps in the training process is sharing the model between the server and the clients, including both the architecture and the initial weights. However, this model-sharing step introduces a distinct attack surface, exposing FL systems to security threats such as malicious model serialization. This paper presents a systematic analysis of the security risks associated with model sharing in FL systems by examining commonly used techniques, tools, and deployment practices. We show that legacy model formats lacking built-in security mechanisms remain widely adopted, significantly increasing the attack surface, and that the growing popularity of model hubs further amplifies these risks by enabling large-scale distribution of malicious artifacts. While recent approaches have been proposed to improve model security, documented zero-day vulnerabilities demonstrate that the model-sharing process remains fragile in practice. By consolidating existing vulnerabilities and defenses, this work aims to raise awareness of the risks inherent to model sharing and to motivate the adoption of more secure model-sharing practices in privacy-sensitive FL deployments.
Download