Johanna Driever

Can Machine Learning Models Trained on U.S. Health Data Predict Diseases in the UK? A Study on Domain Adaptation for EHR-based Transformers

July 28, 2025

Master's student Johanna Driever talks about her latest paper in which she used Domain Adaptation to generalize Machine Learning predictions across EHR datasets from different healthcare systems.

Structured Electronic Health Records (EHRs) offer rich data to train machine learning models to predict diseases, as they contain, for example, coded diagnoses and prescribed medication. However, one of the key obstacles in building such models is their limited ability to generalize across healthcare systems. When a model trained on data from one country is applied in a different healthcare environment, its predictive performance often degrades due to differences in medical coding standards, clinical practices, population characteristics, and data collection procedures.

Domain Adaptation

To overcome this challenge, we employ Domain Adaptation (DA), which is a technique in machine learning used to apply a model trained on one dataset, the source domain, to a different but related dataset, the target domain. This is useful when the target data is limited or has no labels, but there is a larger, labeled dataset available elsewhere. DA helps the model adjust to differences between datasets, such as changes in patient demographics or coding systems, which are common in healthcare. In this study, we explore two types of DA: supervised, where some labeled examples are available in the target domain, and unsupervised, where only unlabeled target data is used. Both approaches aim to make the model’s understanding more general, so it can perform well even when faced with unfamiliar data.

Applying Domain Adaptation to a Transformer Model for EHR Data

In our recent study, we investigated whether and how DA methods can be used to transfer a transformer-based disease prediction model across such distinct EHR datasets. We used Ex-Med-BERT [4], a BERT-derived architecture designed specifically for structured EHR data, pre-trained on over 3.5 million U.S. patients using the IBM Explorys Therapeutics Dataset [1] and evaluated its adaptability to the UK Biobank [6] data. Our goal was to determine whether DA techniques could successfully address the heterogeneity between U.S. and UK EHR systems.

We applied and compared five DA strategies, including both supervised (weighted Empirical Risk Minimization [7], Contrastive Semantic Alignment Loss [5], and Triplet Loss [3]) and unsupervised approaches (Minimum Class Confusion [2], Margin Disparity Discrepancy [8]), and tested their ability to predict six clinical endpoints.

The endpoints we use range across neurological (Alzheimer’s disease, dementia, depression, epilepsy, and Parkinson’s disease), cardiovascular (atrial fibrillation/flutter), and respiratory (chronic obstructive pulmonary disease) conditions. To address class imbalance between diagnosed patients and controls, we applied propensity score matching to extract balanced datasets for each endpoint from the UK Biobank’s Primary Care (GP) and Hospital Inpatient (INP) datasets. This results in two target datasets per endpoint, each being a subset of the entire GP or INP dataset. Due to variability in diagnosis prevalence, dataset sizes ranged from as few as 701 patients (Epilepsy, GP) to as many as 8,905 patients (Depression, INP).

Figure 1 illustrates endpoint performance on the target data. Performance improvements over the baseline are observed for nearly all endpoints and DA methods. Notably, larger datasets, such as Chronic Obstructive Pulmonary Disease, demonstrate greater improvement compared to smaller datasets like Parkinson’s Disease. These findings emphasize the critical role of choosing the right adaptation strategy based on the availability of labeled data in the target domain.

Figure 1: AUROC scores across clinical endpoints for each DA method using the full target dataset. Complex DA techniques outperform simple fine-tuning and weighted Empirical Risk Minimization in most settings. Results are shown for the UK Biobank’s Primary Care (GP) as well as the Hospital Inpatient Dataset (INP) used as target dataset.

Citations

[1] BM Explorys Therapeutic Datasets. https://www.ibm.com/docs/en/announcement_archive/ENUS216-401/ENUS216-401.PDF

[2] Ying Jin et al. Minimum Class Confusion for Versatile Domain Adaptation. arXiv:1912.03699 [cs.LG], 2020. https://doi.org/10.48550/arXiv.1912.03699

[3] Pablo Laiz, Jordi Vitrià, and Santi Seguí. “Using the Triplet Loss for Domain Adaptation in WCE”. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 2019, pp. 399–405. https://doi.org/10.1109/ICCVW.2019.00051

[4] Manuel Lentzen et al. “A Transformer-Based Model Trained on Large Scale Claims Data for Prediction of Severe COVID-19 Disease Progression”. In: IEEE Journal of Biomedical and Health Informatics 27.9 (2023), pp. 4548–4558. https://doi.org/10.1109/JBHI.2023.3288768

[5] Saeid Motiian et al. “Unified Deep Supervised Domain Adaptation and Generalization”. In: IEEE International Conference on Computer Vision (ICCV) (2017). https://doi.org/10.48550/arXiv.1709.10190

[6] UKBiobank. https://www.ukbiobank.ac.uk/

[7] Rongguang Wang et al. Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience. 2024. https://doi.org/10.48550/arXiv.2308.03175

[8] Yuchen Zhang et al. Bridging Theory and Algorithm for Domain Adaptation. arXiv:1904.05801 [cs.LG], 2019. https://doi.org/10.48550/arXiv.1904.05801

Blog post July, 2025

Johanna Driever

Can Machine Learning Models Trained on U.S. Health Data Predict Diseases in the UK? A Study on Domain Adaptation for EHR-based Transformers

Domain Adaptation

Applying Domain Adaptation to a Transformer Model for EHR Data

About