Proteomic-based CV prediction model outperforms clinical risk model in primary prevention cohort
Improved cardiovascular risk prediction using targeted plasma proteomics in primary prevention
Introduction and methods
Accurate cardiovascular (CV) risk assessment for primary prevention is challenging in asymptomatic individuals [1,2]. Current CV risk models use traditional risk factors and predict future adverse events with limited accuracy [3,4]. Adding single biomarkers to these conventional models only slightly improves prediction of outcomes. This still leaves a substantial amount of the general population unidentified until their first clinical manifestation.
This study used high through-put proteomics combined with machine learning techniques to evaluate risk assessment for CV events in the primary prevention setting. This proteome-based risk model was then compared to a risk model with traditional risk factors and consecutively validated in an independent, external primary prevention cohort.
The derivation cohort was a nested case-control sample derived from the European Prospective Investigation (EPIC)-Norfolk prospective population study , recruited from general practices in the Norfolk area, UK. From baseline data (between 1993 and 1997), 822 healthy individuals were selected. Healthy was defined as cohort participant without a prior history of CVD. A total of 411 who developed an acute myocardial infarction (MI), which resulted in hospitalization or death, were selected together with 411 participants who did not develop CVD.
De Progressione della Lesioni Intimale Carotidea (PLIC) cohort  was a single-center, observational, cross-sectional, prospective study of voluntary people, who were enrolled from 1998 to 2000 and followed for 11 years on average in the northern area of Milan. For the validation cohort, 702 individuals were selected: 351 had developed atherosclerosis, including persons with subclinical atherosclerosis and 44 experienced a CV event, and 351 matching controls. CV events were defined as coronary heart disease (MI, unstable angina, coronary revascularization, and silent ischemia) and/or cerebrovascular disease (ischemic stroke and transient ischemic attack).
The expression levels of 333 unique plasma proteins related to pathways and/or risk factors involved in atherogenesis were measured from the CV II, CV III, Cardiometabolic and Inflammation panels using the proximity extension assay technology.
Different machine learning models were constructed: 1) A clinical risk model with traditional risk factors that included age, gender, BMI, smoking, diabetes, SBP, antihypertensive medication, total cholesterol, HDL-c, and triglyceride levels (parameters obtained from Framingham Risk Score, pooled cohort equations, and SCORE); 2) A protein-based model consisting of 50 predictive plasma proteins only; 3) The protein parameters and clinical risk parameters combined. Stability selection with extreme gradient boosting was used to identify the best predictive biomarkers in the clinical and proteomics datasets for both short term (~3 years) and long term prediction models (median follow-up of 20 years).
- Prediction of MI in the derivation cohort using the protein-based model (over a median of 20 years) resulted in an receiver operating system (ROC) AUC of 0.754±0.011 (permutation test P=0.0099). Using the traditional risk factor model resulted in an ROC AUC of 0.730±0.015 (permutation test P=0.0099). Combining the traditional risk factors model with the protein based model gave an ROC AUC of 0.764±0.015 (permutation test P=0.0099). The proteomics model was superior to the clinical risk model (P<0.001).
- The optimal time point for the prediction of MI in the derivation cohort was 1132 days (~3 years), when applying the Markov-Chain Monte Carlo algorithms to the derivation cohort. When focusing on MI event risk in the first 3 years, the prediction of the proteomics model increased to an ROC AUC of 0.803±0.093 (permutation test P=0.0145). The clinical model had an ROC AUC of 0.732±0.164 (permutation test P=0.0099). So, the protein-based model was superior to the clinical risk model (P=0.025). Adding traditional risk factors to the protein based model did not improve the risk prediction for MI and gave an ROC AUC of 0.808±0.085 (permutation test P=0.0178 [P=0.721 compared to proteomics prediction model]).
- When testing the protein-based and clinical model in the validation cohort, the validation of the models was done using the 44 cases that suffered a CV event compared to the 351 healthy controls. The protein-based model significantly outperformed the clinical model with traditional risk factors (ROC AUC of 0.705±0.071, permutation test P=0.0099 vs.0.609±0.057, permutation test P=0.0700, respectively, P<0.001). The two models combined gave an ROC AUC of 0.692±0.090 (permutation test P=0.0099), which did not improve the risk prediction in the proteomics model (P=0.618).
A protein-based CV risk model outperformed a clinical model with traditional risk factors in a primary prevention setting, especially for short term CV risk events. Adding these traditional risk factors to the protein-based model did not increase the prediction value of the model.