Predicting Customer Churn using Voice of the Customer. A Text Mining Approach.

UoM administered thesis: Phd

  • Authors:
  • Carolina Martínez Troncoso


The high levels of competition prevalent in all sectors of the economy have led service providers to implement churn-predictive models. One way of contributing to the churn prediction model-building process is through the data type or input variables used. In this context, the voice of the customer (VOC), a source of service-related data resulting from the proactive role played by customers who, increasingly, interact with companies through remote channels, is attracting interest. In particular, unsolicited comments written by customers in their very own words are deemed to be information-rich, full of dynamic evaluations of the service experienced and having a low extent of response bias. The extent to which the use of unsolicited, direct and unstructured VOC interactions can help predicting churn mainly depends on the advancements of text mining (TM) and the adoption of this technique by companies, to evolve from manual to automated analysis. In fact, there is a lack of research that uses unstructured VOC data to generate service-related variables to develop churn-predictive models. As a result, there is also a lack of existing frameworks that provide a guide to extract the causes of customer churn and convert it into usable data to predict this behaviour. Existing frameworks require a high level of knowledge to extract information from textual VOC interactions, are domain-dependent or provide insufficient information about causes of churn behaviour. No previous study has automated the extraction of churn determinants from VOC interactions using TM techniques. Previous attempts have used key terms as predictors instead of the extraction of constructs such as churn determinants. This implies that there is no library/dictionary available to automate the extraction of reasons that lead to customers abandoning service providers. This thesis contributes to address the aforementioned literature gaps by: (i) proposing a ready-to-use framework to extract customer churn determinants from VOC interactions; (ii) proposing a text mining process to automate the extraction of customer churn determinants from VOC interactions; (iii) proposing a text-mining model, applicable in a specific domain, allowing the generation of more concrete retention-oriented tactics. The proposed framework adapts and extends the eight incident model of Keaveney (1995), including two new determinants; churn intentions and multiple incidents. Moreover, the proposed text-mining process considers three main stages: model construction, training and testing and, additionally, the implementation of a library validation procedure. The proposed approach is demonstrated by means of a case study. The case study processed 23,195 unsolicited, direct and unstructured VOC interactions, from 14,531 customers, facilitated by a Chilean bank. Results obtained show that it is possible to build a churn-predictive model considering only service-related predictors (churn determinants) captured/extracted from VOC interactions. The performance of 40 different models were compared. The best results were obtained using a support vector machine (SVM) classifier with a radial basis function (RBF) kernel, reaching an overall accuracy=58.3%, precision=58.5%, and recall=58.4%. Furthermore, it was demonstrated that churners reported significantly more churn intentions than non-churners and that pricing, dishonest behaviour, catastrophic failures and churn intentions are the factors with the greatest impact on customer churn. These results provide a concrete guidance for designing not only retention-oriented but also communicational campaigns in the retail banking domain.


Original languageEnglish
Awarding Institution
Award date1 Aug 2019