ClaimsBERT – Predicting Disease Risks Using a Transformer-Based Language Model (BERT) on Statutory Health Insurance Claims Data
To tailor healthcare to individual needs, it is important to identify health risks as early as possible. Statutory health insurance providers routinely generate large amounts of billing data, including structured information on diagnoses, treatments, prescriptions, hospital stays, and other reimbursed services. ClaimsBERT investigates how to use artificial intelligence to systematically analyze this so-called GKV Claims data. The project aims to build a foundation model that can help identify health risks at an early stage across various application areas.
The project is based on modern transformer architectures, which excel at recognizing patterns in large datasets and tracking relationships over long time spans. These AI models process data sequentially and additionally incorporate patient characteristics such as age or sex. In this way, they can identify patterns that may indicate future health developments. Results are checked for medical plausibility after training, and the model is fine-tuned to further improve its predictive accuracy.
The foundation model is intended to support a range of medical use cases, such as:
- Predicting the onset of the need for long-term care
- Hospital admissions resulting from adverse drug effects
- Incident diagnoses of breast cancer
- Rheumatic disorders or restless legs syndrome
- Hospitalizations due to hypoglycemia or heart failure
A long-term goal is to establish a foundation that can extend to a broad range of use cases, tackling other diseases and health conditions with comparably little additional effort.
The project is led and coordinated by the AOK Research Institute (WIdO), with Fraunhofer SCAI as consortium partner.
ClaimsBERT is funded with approximately 1.3 million euros through the Innovation Fund of the Federal Joint Committee (G-BA) under grant agreement number 01VSF25038.
Project duration: March 2026 until February 2029