Physicians' Academy for Cardiovascular Education

Identification of 5 HF subtypes using a machine learning approach

Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study

Literature - Banerjee A, Dashtban A, Chen S, et al. - Lancet Digit Health. 2023 Jun;5(6):e370-e379. doi: 10.1016/S2589-7500(23)00065-1

Introduction and methods


Current HF subtype classifications have not resulted in precision medicine, personalized care, or targeted therapies [1-6]. Moreover, incomplete knowledge of HF subtypes across the wide spectrum of causal factors and populations has limited primary prevention and screening guidelines for this disease [7,8].

Aim of the study

In a large population of patients with incident HF, the authors used machine learning to (1) identify subtypes with clinical relevance throughout the HF disease course, and low risk of bias for patient selection and algorithms; (2) demonstrate internal, external, prognostic, and genetic validity; and (3) develop potential clinical pathways to improve impact.


In this external, prognostic, and genetic validation study, the authors used their 2021 framework for practical machine learning implementation consisting of 6 stages: clinical relevance, patients, algorithm, internal validation (within dataset and across methods), external validation (across methods), clinical utility, and effectiveness) [9]. Data of patients with incident HF aged ≥30 years were extracted from 2 population-based electronic health record databases in the UK, Clinical Practice Research Datalink (CPRD; n=188,800) and The Health Improvement Network (THIN; n=124,262), from 1998 to 2018.

The CPRD and THIN datasets yielded 645 factors before and after HF diagnosis, including demographic information, comorbidities, and medication use and persistence. For the algorithm, 87 of these 645 factors were selected. To reduce the risk of algorithmic bias, the following 4 unsupervised machine learning methods were compared: K-means, hierarchical, K-medoids, and mixture modeling.

Subtypes were identified and evaluated for: (1) external validity; (2) prognostic validity (predictive accuracy for 1-year all-cause mortality); and (3) genetic validity (associations with single nucleotide polymorphisms (SNPs) and polygenic risk scores (PRSs) for HF-related traits, using UK Biobank data (n=9573)).

To assess clinical utility, 5 HF clinicians were asked about clinical relevance, justification, and interpretability of the results. Based on their input, a model predicting cluster and survival was developed, as well as an HF cluster app for routine clinical use.

Main results

Internal and external validations and subtype identification

Prognostic validation

Genetic validation

Clinical utility and effectiveness


Using their 6-stage framework for machine learning implementation, the authors identified 5 HF subtypes (early onset, late onset, AF-related, metabolic, and cardiometabolic) and validated these subtypes based on population-representative data. The 5 subtypes showed good predictive accuracy for 1-year all-cause mortality. To assess effectiveness of their approach, the authors also developed an open-access HF cluster app that clinicians can use to identify the cluster that fits a particular patient and their predicted survival.


Show references

Find this article online at Lancet Digit Health. Find here the heart failure cluster app

Share this page with your colleagues and friends: