The credibility of a regulator could be threatened if stakeholders perceive that assessments of performance made by its inspectors are unreliable. Yet there is little published research on the reliability of inspectors' assessments of health care organizations' services.
OBJECTIVES: We investigated the inter-rater reliability of assessments made by inspectors inspecting acute hospitals in England during the piloting of a new regulatory model implemented by the Care Quality Commission (CQC) during 2013 and 2014. Multi-professional teams of inspectors rated service provision on a four-point scale for each of five domains: safety; effectiveness; caring; responsiveness; and leadership.
METHODS: In an online survey, we asked individual inspectors to assign a domain and a rating to each of 10 vignettes of service information extracted from CQC inspection reports. We used these data to simulate the ratings that might be produced by teams of inspectors. We also observed inspection teams in action, and interviewed inspectors and staff from hospitals that had been inspected.
RESULTS: Levels of agreement varied substantially from vignette to vignette. Characteristics such as professional background explained only a very small part of the variation. Overall, agreement was higher on ratings than on domains, and for groups of inspectors compared with individual inspectors. A number of potential causes of disagreement were identified, such as differences regarding the weight that should be given to contextual factors and general uncertainty about interpreting the rating and domain categories.
CONCLUSION: Groups of inspectors produced more reliable assessments than individual inspectors, and there is evidence to support the utility of appropriate discussions between inspectors in improving reliability. The reliability of domain allocations was lower than for ratings. It is important to define categories and rating levels clearly, and to train inspectors in their use. Further research is needed to replicate these results now that the model has been fully implemented, and to understand better the impact that inspector uncertainty and disagreement may have on published CQC ratings.