Missing data was handled inconsistently in UK prediction models: a review of method used

Research output: Contribution to journalArticlepeer-review


Objective: No clear guidance exists on handling missing data at each stage of developing, validating and implementing a clinical prediction model (CPM). We aimed to review the approaches to handling missing data that underly the CPMs currently recommended for use in UK healthcare.
Study design and Setting: A descriptive cross-sectional meta-epidemiological study aiming to identify CPMs recommended by the National Institute for Health and Care Excellence (NICE), which summarized how missing data is handled across their pipelines.
Results: 23 CPMs were included through ‘sampling strategy’. Six missing data strategies were identified: complete case analysis (CCA), multiple imputation, imputation of mean values, k-nearest neighbours imputation, using an additional category for missingness, considering missing values as risk-factor-absent. 52% of the development articles and 48% of the validation articles did not report how missing data were handled. CCA was the most common approach used for development (40%) and validation (44%). At implementation, 57% of the CPMs required complete data entry, whilst 43% allowed missing values. 3 CPMs had consistent paths in their pipelines.
Conclusion: A broad variety of methods for handling missing data underly the CPMs currently recommended for use in UK healthcare. Missing data handling strategies were generally inconsistent. Better quality assurance of CPMs needs greater clarity and consistency in handling of missing data.

Bibliographical metadata

Original languageEnglish
JournalJournal of Clinical Epidemiology
Volume140C (2021) pp. 149-158
Publication statusAccepted/In press - 7 Sep 2021