Risk score based on electronic health record data improves prediction of CAD risk
Coronary Risk Estimation Based on Clinical Data in Electronic Health Records
Introduction and Methods
To identify individuals at risk of developing coronary artery disease (CAD), several risk assessment tools have been designed, such as the pooled cohort equations (PCE) and polygenic risk score (PRS). The applicability of the conventional clinical risk score, the PCE score, is limited by underestimation and overestimation of CAD risk [1-3], as well as by biases in certain populations [4-6], while the clinical utility of a high PRS for CAD is still under investigation . As electronic health records (EHRs) contain core characteristics of several diseases, they could be a valuable resource for more precise risk prediction and stratification.
Aim of the study
The authors investigated whether a prediction score based on EHR clinical features (EHR score) can improve short-term CAD risk prediction and reclassification beyond that of the PCE score and PRS.
To develop the EHR score, a machine learning–based approach was applied to clinical data extracted from EHRs.
The predictive power of the EHR score, PCE score, and PRS was first assessed in a hospital-based, multiethnic, EHR-linked cohort (BioMe Biobank cohort) and then externally validated in a population-based cohort with available EHR and genotype data (UK Biobank cohort).
Both study populations were selected based on clinical guidelines to guide initiation of statin treatment. The inclusion criterion was age 40–79 years, and individuals taking statins or with second-degree relatedness or higher were excluded. The BioMe Biobank cohort comprised 555 CAD cases and 6349 control subjects and the UK Biobank cohort 3130 CAD cases and 378,344 controls.
The predictive performance of the three models was assessed by determining the area under the receiver-operating characteristic curve, and the net reclassification improvement was calculated.
The first endpoint was prediction of CAD 1 year prior to diagnosis in cases (primary analysis). A secondary analysis was performed on risk of stroke alone and risk of ASCVD (CAD, angina, and stroke). In addition, the ability of the EHR score and PRS to reclassify individuals for the 1-year CAD risk based on the PCE score was assessed.
- The EHR score improved CAD prediction by 12% in the BioMe Biobank cohort and by 9% in the UK Biobank cohort, compared with the PCE score.
- In the BioMe Biobank cohort, similar improvements in risk prediction were seen for ASCVD and stroke when using the EHR model.
- In individuals with a low CAD risk (PCE score <7.5), the EHR score improved CAD prediction by 20% (BioMe Biobank cohort) and by 11% (UK Biobank cohort), compared with the PCE score.
- Compared with the PCE or EHR score, the PRS did not improve CAD prediction.
- The EHR score reclassified 25.8% (BioMe Biobank cohort) and 15.2% of the study populations (UK Biobank cohort) for the 1-year CAD risk, compared with the PCE score.
- In the subgroup of low-risk individuals, the EHR score reclassified 34.4% (BioMe Biobank cohort) and 15.2% (UK Biobank cohort) of individuals, compared with the PCE score.
- The PRS did not contribute substantially to reclassification of the CAD risk.
Positive and negative predictive values, sensitivity, and specificity and false positives
- The positive predictive value (probability of detecting true positives) of the EHR score was 14% higher (BioMe Biobank cohort) and 23% higher (UK Biobank cohort), compared with the PCE score.
- The sensitivity (percentage of cases identified) of the EHR score was 12% higher (BioMe Biobank cohort) and 13% lower (UK Biobank cohort), compared with the PCE score.
- The negative predictive value of the EHR score was 12% higher (BioMe Biobank cohort) and 2% lower (UK Biobank cohort), compared with the PCE score.
- The specificity of the EHR score was 13% higher (BioMe Biobank cohort) and 29% higher (UK Biobank cohort), compared with the PCE score.
- In the top 15% of EHR scores, there were 5.1% and 12.9% of false positives compared with 14.4% and 50.4% of false positives in the top 15% of the PCE scores in the BioMe Biobank and UK Biobank, respectively.
The EHR score increased the prediction of the 1-year CAD risk and improved reclassification of individuals for the 1-year CAD risk, compared with the PCE score and PRS, particularly in low-risk individuals. According to the authors, the clinical use case of this score “would be to identify high-risk individuals (who are flagged as low risk by traditional scores), optimizing prevention and care using embedded pathways.”