Evolving Controllably Difficult Datasets for ClusteringCitation formats

Standard

Evolving Controllably Difficult Datasets for Clustering. / Shand, Cameron; Allmendinger, Richard; Handl, Julia; Webb, Andrew; Keane, John.

Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) . 2019.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Shand, C, Allmendinger, R, Handl, J, Webb, A & Keane, J 2019, Evolving Controllably Difficult Datasets for Clustering. in Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) . The Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13/07/19.

APA

Shand, C., Allmendinger, R., Handl, J., Webb, A., & Keane, J. (Accepted/In press). Evolving Controllably Difficult Datasets for Clustering. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19)

Vancouver

Shand C, Allmendinger R, Handl J, Webb A, Keane J. Evolving Controllably Difficult Datasets for Clustering. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) . 2019

Author

Shand, Cameron ; Allmendinger, Richard ; Handl, Julia ; Webb, Andrew ; Keane, John. / Evolving Controllably Difficult Datasets for Clustering. Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19) . 2019.

Bibtex

@inproceedings{b1821b0b05d04fb897c9cf762cfb8768,
title = "Evolving Controllably Difficult Datasets for Clustering",
abstract = "Synthetic datasets play an important role in evaluating clustering algorithms,as they can help shed light on consistent biases, strengths, and weaknesses of particular techniques, thereby supporting sound conclusions. Despite this, there is a surprisingly small set of established clustering benchmark data, and many of these are currently handcrafted. Even then, their difficulty is typically not quantified or considered, limiting the ability to interpret algorithmic performance on these datasets. Here, we introduce HAWKS, a new data generator that uses an evolutionary algorithm to evolve cluster structure of a synthetic data set. We demonstrate how such an approach can be used to produce datasets of a pre-specified difficulty, to trade off different aspects of problem difficulty, and how these interventions directly translate into changes in the clustering performance of established algorithms.",
author = "Cameron Shand and Richard Allmendinger and Julia Handl and Andrew Webb and John Keane",
year = "2019",
month = "4",
day = "17",
language = "English",
booktitle = "Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19)",

}

RIS

TY - GEN

T1 - Evolving Controllably Difficult Datasets for Clustering

AU - Shand, Cameron

AU - Allmendinger, Richard

AU - Handl, Julia

AU - Webb, Andrew

AU - Keane, John

PY - 2019/4/17

Y1 - 2019/4/17

N2 - Synthetic datasets play an important role in evaluating clustering algorithms,as they can help shed light on consistent biases, strengths, and weaknesses of particular techniques, thereby supporting sound conclusions. Despite this, there is a surprisingly small set of established clustering benchmark data, and many of these are currently handcrafted. Even then, their difficulty is typically not quantified or considered, limiting the ability to interpret algorithmic performance on these datasets. Here, we introduce HAWKS, a new data generator that uses an evolutionary algorithm to evolve cluster structure of a synthetic data set. We demonstrate how such an approach can be used to produce datasets of a pre-specified difficulty, to trade off different aspects of problem difficulty, and how these interventions directly translate into changes in the clustering performance of established algorithms.

AB - Synthetic datasets play an important role in evaluating clustering algorithms,as they can help shed light on consistent biases, strengths, and weaknesses of particular techniques, thereby supporting sound conclusions. Despite this, there is a surprisingly small set of established clustering benchmark data, and many of these are currently handcrafted. Even then, their difficulty is typically not quantified or considered, limiting the ability to interpret algorithmic performance on these datasets. Here, we introduce HAWKS, a new data generator that uses an evolutionary algorithm to evolve cluster structure of a synthetic data set. We demonstrate how such an approach can be used to produce datasets of a pre-specified difficulty, to trade off different aspects of problem difficulty, and how these interventions directly translate into changes in the clustering performance of established algorithms.

M3 - Conference contribution

BT - Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO '19)

ER -