Physicians' Academy for Cardiovascular Education

New screening tool uses machine learning to identify individuals with probable FH in large datasets

Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data

Literature - Myers KD, Knowles JW, Staszak D et al., - Lancet Digit Health. 2019; 1(8): e393-e402

Introduction and methods

In the United States, fewer than 10% of individuals with familial hypercholesterolemia (FH) are identified, which leaves them untreated, despite their likely elevated LDL-c levels and risk of premature coronary artery disease [1-3]. US Guidelines recommend screening to identify families with FH, but the best method for large-scale screening has yet to be established. A successful method probably includes both efficient cascade screening and effective index identification [4-8].

The authors have previously reported on successful application of a machine learning model to identify undiagnosed individuals with FH, built from and applied to single healthcare institutions [9]. This study aims to build a model that can be applied at both the institutional and national healthcare database scales to identify new index cases. Therefore, the FIND FH machine learning model was constructed. Model characteristics were defined using individuals with FH and individuals presumed not to have FH. Consequently, it was tested whether the model can identify individuals with a medical profile consistent with FH in independent clinical settings.

Electronic health records (EHR) structured data were used from four large academic health systems were used to build and train the FIND FH machine learning model. For training of the model, a case was defined as an individual with a clinical diagnosis of FH by a lipid expert (939 individuals, 42% of whom were genetically confirmed) and a presumed control without FH as an individual with no previous diagnosis of FH by a lipid expert in their medical record (83136 individuals).

For more details regarding development of the algorithm, we refer to the original article.

Main results


The FIND FH is a machine learning model that could identify phenotypic FH when applied to large medical datasets. In two distinct types of large medical datasets, it identified a large number of individuals with probable FH who had not previously been diagnosed. The model was built on longitudinal medical data from individuals with at least one documented CV disease risk factor in their history. It does not rely on specific information such as tendon xanthomas or family history. Importantly, FIND FH does not only rely on lipid concentrations, which is an advantage as many patients identified did not have lipid levels in their EHR.

Editorial comment

After having repeated the risks associated with having FH and the potential of treatment when the disease is recognized early, Pereira [10] notes how the fact that it is dominantly inherited facilitates cascade screening. Nonetheless, most societies fall short in identifying new FH cases, and thus in targeting them with preventive strategies.

Pereira lists many reasons used to explain the low numbers of identified subjects with FH, but he states that none of these justify the inaction. ‘In an era of personalized medicine, FH is potentially one of the most tractable conditions to deliver the promised society-wide benefits of the implementation of this paradigm,’ he says. Controlling the disease requires a multipronged approach and the orchestrated participation of several parties. The article of Myers and colleagues forms ‘a contribution to the toolbox of those fighting under-diagnosis of the disease’.

Pereira does question whether flagging algorithms that are native to electronic health systems will induce real advances in combating the disease. FIND FH, in its current state, will only flag potential patients to physicians who choose to receive the information. For the system to translate into better control of diagnosed patients, further steps are also required. Additional tools are needed, which help administrators to monitor the deliverables of this type of system. Data on what is done with the information on identified cases, with regard to their lipid levels, contacted family members, and treatment (response), will yield information on new bottlenecks, which should be used to create a roadmap for implementation of the new system in such a way that it can curb the disease.

Pereira expresses the hope that the already available technology will already be benefited from, to pave the way to truly transformative care of individuals with FH.


Show references

Find this article online at Lancet Digit Health

Share this page with your colleagues and friends: