Predictive performance of risk scores for ischemic stroke in AF
Comprehensive comparison of stroke risk score performance: a systematic review and meta-analysis among 6 267 728 patients with atrial fibrillation
Introduction and methods
In patients with AF, the risk of ischemic stroke (IS) can be predicted by several different risk scores. However, current risk scores appear to have a limited overall ability to predict IS in AF patients [1-3]. At the same time, newer risk scores and updates of earlier scores seem to be understudied.
Aim of the study
The objectives of this study were to identify and systematically review all available risk scores predicting IS risk in AF patients, their external validations, and updates, to assess the methodological quality of the studies on these risk scores, and to calculate a pooled estimate of the predictive performance.
The authors conducted a literature search in PubMed and Web of Science for studies in which a multivariable prognostic IS risk score for AF patients was developed, validated, or updated. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was used to examine the methodological quality of the studies. To assess discrimination (i.e., score’s ability to discriminate between events and nonevents), pooled c-statistics were calculated for 6,267,728 patients and 359,373 events of IS using a random-effects meta-analysis. The level of calibration (i.e., agreement between predicted and observed risk) was also estimated based on different statistics . For each of the validation studies included in the meta-analysis, 8 sensitivity analyses were performed.
- In total, 19 scores were identified, which were validated in 107 studies (329 external validations in total) and updated at least once in 40 studies (76 updates in total).
Methodological quality of studies
- The PROBAST indicated a risk of methodological bias in all studies, which was mostly due to a high or unclear risk of bias in the outcome domain and a high risk of bias in the analysis domain of this tool.
- For all studies, there were also concerns regarding the applicability of the risk score.
- Discrimination was reported in 16 of the 19 development studies (c-statistic ranged from 0.61 to 0.86).
- Information on calibration (presented as observed vs. expected risks and/or Hosmer–Lemeshow statistic) was available for 8 risk scores and indicated “adequate calibration” or “no evidence of poor calibration.”
- Of the 329 validations, most were performed on the CHA2DS2-VASc score (n=147) and CHADS2 score (n=75). Several of the newer risk scores had not been validated.
- All 107 validation studies presented information on discrimination, indicating poor (<0.60) to reasonable (0.60–0.80) and in exceptional cases good (>0.80) discrimination. For risk scores developed after the publication of the CHA2DS2-VASc score in 2010, no discrimination <0.60 was reported.
- Calibration was reported in 14 studies (13%), which showed improved calibration measures for the modified CHADS2, ABC, and GARFIELD-AF scores compared with the CHA2DS2-VASc score.
- The 40 update studies only reported on the CHA2DS2-VASc score (n=25), CHADS2 score (n=7), or both scores (n=8).
- For the CHA2DS2-VASc score, 46/49 updates showed improved discrimination compared with the original CHA2DS2-VASc score, whereas 13/17 updates for the CHADS2 score indicated improved discrimination compared with the original CHADS2 score.
- For 10 scores (CHA2DS2-VASc, CHADS2, AFI, Framingham, ATRIA, SPAF, ABC, modified CHADS2, GARFIELD-AF, and GARFIELD-AF II), pooled c-statistics could be calculated, which ranged from 0.598 (95%CI: 0.558–0.636; 12 validations) for the AFI to 0.715 (95%CI: 0.674–0.754; 10 validations) for the modified CHADS2 score.
- The pooled c-statistic was 0.644 (95%CI: 0.635–0.653) for the CHA2DS2-VASc score and 0.658 (95%CI: 0.644–0.672) for the CHADS2 score.
- Most sensitivity analyses showed no or only marginal differences, which indicated that factors such as age, anticoagulation therapy, and the PROBAST score, had no or only limited effect on the discriminative abilities of most risk scores.
This systematic review and meta-analysis identified 19 risk scores (with 329 validations and 76 updates in total) to predict IS risk in patients with AF. Although all risk scores showed poor to reasonable performance, newer scores tended to have a slightly better discriminative ability than the conventional, guideline-endorsed CHA2DS2-VASc score. The modified CHADS2 score showed overall the best discriminative abilities. Methodological bias was observed in all studies. The authors stress that “additional external validations and data on calibration are required before considering the newer scores in clinical practice.”