Evidential Reasoning for Preprocessing Uncertain Categorical Data for Trustworthy Decisions: An Application on Healthcare and Finance

Research output: Contribution to journalArticlepeer-review


The uncertainty attributed by discrepant data in AI-enabled decisions is a critical challenge in highly regulated domains such as health care and finance. Ambiguity and incompleteness due to missing values in output and input attributes, respectively, is ubiquitous in these domains. It could have an adverse impact on a certain unrepresented set of people in the training data without a developer’s intention to discriminate. The inherently non-numerical nature of categorical attributes than numerical attributes and the presence of incomplete and ambiguous categorical attributes in a dataset increases the uncertainty in decision-making. This paper addresses the challenges in handling categorical attributes as it is not addressed comprehensively in previous research. Three sources of uncertainties in categorical attributes are recognised in this research. The informational uncertainty, unforeseeable uncertainty in the decision task environment, and the uncertainty due to lack of pre-modelling explainability in categorical attributes are addressed in the proposed methodology on maximum likelihood evidential reasoning (MAKER). It can transform and impute incomplete and ambiguous categorical attributes into interpretable numerical features. It utilises a notion of weight and reliability to include subjective expert preference over a piece of evidence and the quality of evidence in a categorical attribute, respectively. The MAKER framework strives to integrate the recognised uncertainties in the transformed input data that allow a model to perceive data limitations during the training regime and acknowledge doubtful predictions by supporting trustworthy pre-modelling and post modelling explainability. The ability to handle uncertainty and its impact on explainability is demonstrated on a real-world healthcare and finance data for different missing data scenarios in three types of AI algorithms: deep-learning, tree-based, and rule-based model.

Bibliographical metadata

Original languageEnglish
JournalExpert Systems with Applications
Publication statusAccepted/In press - 11 Jul 2021