Importance of Quality Control in ‘Big Data’: implications for statistical inference of electronic health records in clinical cardiology

Research output: Contribution to journalArticle


Routinely collected health data are helping to initiate investigations of clinical cardiology into patient groups that might be more reflective of the patients seen in routine clinical practice, compared with classical randomised controlled trials. This presents numerous opportunities for clinical and basic science communities, such as utilising advances in imaging technology, using large-scale data to quantify the ‘real world’ clinical impact of an intervention, or using the experiences of patients to make predictions about similar patients in the future. However, such opportunities are dependent on improvements in data quality issues that are common in electronic health records. In this commentary article, we discuss two common data quality issues inherent in routinely collected health data: firstly, inaccurate recording of diagnostic codes and secondly, data completeness. We discuss the potential biases that might be inherent in the collection of routinely collected health data, such as informative observation, and the need to address these at the analysis stage to ensure conclusions remain robust. There needs to be a collective effort to reduce occurrences of missing data and improve data quality in large-scale electronic health records and other routinely collected health data; wider involvement of patients in their own data collection and storage is one potential solution.

Bibliographical metadata

Original languageEnglish
JournalCardiovascular research
Early online date25 Mar 2019
Publication statusPublished - 2019