Multi-view Clustering of Heterogeneous Health Data: Application to Systemic Sclerosis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

  • External authors:
  • Adán José-garcía
  • Julie Jacques
  • Alexandre Filiot
  • David Launay
  • Vincent Sobanski
  • Clarisse Dhaenens

Abstract

Electronic health records (EHRs) involve heterogeneous data types such as binary, numeric and categorical attributes. As traditional clustering approaches require the definition of a single proximity measure, different data types are typically transformed into a common format or amalgamated through a single distance function. Unfortunately, this early transformation step largely pre-determines the cluster analysis results and can cause information loss, as the relative importance of different attributes is not considered. This exploratory work aims to avoid this premature integration of attribute types prior to cluster analysis through a multi-objective evolutionary algorithm called MVMC. This approach allows multiple data types to be integrated into the clustering process, explore trade-offs between them, and determine consensus clusters that are supported across these data views. We evaluate our approach in a case study focusing on systemic sclerosis (SSc), a highly heterogeneous auto-immune disease that can be considered a representative example of an EHRs data problem. Our results highlight the potential benefits of multi-view learning in an EHR context. Furthermore, this comprehensive classification integrating multiple and various data sources will help to understand better disease complications and treatment goals.

Bibliographical metadata

Original languageEnglish
Title of host publicationParallel Problem Solving from Nature – PPSN XVII
Pages352-367
DOIs
Publication statusPublished - 15 Aug 2022
Event17th International Conference on Parallel Problem Solving from Nature - Dortmond, Germany
Event duration: 10 Sep 202214 Sep 2022

Publication series

NameLecture Notes in Computer Science
Volume13399
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Parallel Problem Solving from Nature
Abbreviated titlePPSN XVII
Country/TerritoryGermany
CityDortmond
Period10/09/2214/09/22