Segmentation Approaches for the Identification of Analogies in a Forecasting Context

UoM administered thesis: Phd

  • Authors:
  • Emiao Lu

Abstract

This thesis considers the problem of analogy identification in the context of forecasting. We develop and test a range of segmentation approaches, with the aim of improving the accuracy of forecasting methods that employ analogies. The first manuscript of the thesis outlines our core methodological framework. This framework describes a forecasting process that integrates a multicriteria segmentation approach using a weighted-sum method for the identification of analogies during the segmentation stage. This combines the information from past realizations of a set of time series with information about the factors that govern the patterns observed, at the level of the distance function. Using simulated and real-world data, we illustrate that a concurrent consideration of multiple criteria at the segmentation stage can help to achieve better clustering results, which feed forward to improved forecasting accuracy. This paper contributes to the first methodological framework for the forecasting of analogous time series. Mulcriterion segmentation approaches demonstrate a significant improvement in the forecasting performance compared to single-criterion segmentation methods. The second manuscript focuses on discussing the model selection problem related to the use of multicriteria clustering approaches. Although multicriteria approaches to clustering are advantageous to the final increase of forecasting accuracy, the use of these approaches introduces the challenge of an additional model selection during the segmentation stage. This is because even for the same number of clusters, multicriteria clustering approaches may return sets of clustering solutions that reflect different trade-offs between the conflicting criteria. Therefore, this thesis also includes work addressing the model selection problem for multicriteria clustering in a forecasting context. We demonstrate that the quality of clustering solutions is best assessed in the problem-specific (forecasting) context. Computationally, this is the most expensive approach, and we, therefore, describe a compromise, which uses a standard internal validation technique (the Silhouette Width measure) for the determination of clusters, but performs weight selection based on the best average (historical) forecasting performance of the forecasting algorithm. Further, the third manuscript addresses instability issues stemming from the clustering procedures by integrating bagging techniques into the forecasting process. Segmentations of analogies have been reported to give rise to further increase in the final forecasting accuracy, but the application of clustering techniques in the segmentation stage may result in instabilities related to the model selection step. By combining the forecasts derived from multiple models, the aggregated forecast is expected to lower down the uncertainty of the results via the aggregation scheme. We, therefore, employ the bootstrap aggregation techniques to further improve the forecasting process, and this results in a further boost to the forecasting accuracy. In the final manuscript, we consider the use of multicriteria approaches in time series clustering, where multiple criteria (i.e., distance metrics and/or normalization techniques) are available, but where these relate to time series data alone. Different distance metrics / standardization techniques may emphasize different notions of similarity. In applications where we are not sure which notion of similarity is accurate or where several notions of similarity are relevant, we might benefit from combining multiple distance metrics / standardization techniques, to capture complementary notions of similarity. Our findings suggest a continued advantage of multicriteria clustering approaches in this context.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2018