ANALYSING THE PERFORMANCE OF MACHINE LEARNING ALGORITHMS FOR BIO-AEROSOL DETECTION

UoM administered thesis: Phd

  • Authors:
  • Simon Ruske

Abstract

Primary biological aerosol particles, including bacteria, fungal spores and pollen are a crucial component of the atmosphere. These particles have a significant impact on human health, especially for those who suffer from hay fever and asthma. They can also influence the weather, acting as ice nuclei (IN) or cloud condensation nuclei (CCN) and can be impactful on agriculture through destruction of crops caused by fungal diseases. It is also vital to monitor these particles in an indoor environment: in hospitals as well as cleanrooms within the pharmaceutical manufacturing process. There are several different techniques that scientists can use to analyse bio-aerosol. Manual techniques, including tape and filter sampling, are often used by meteorological institutions to provide a pollen forecast. Alternatively, in a clean indoor environment, pharmaceutical manufacturers may use culture-based techniques to help prevent contamination. These approaches, however, can suffer from low time resolution and substantial human labour costs. As a consequence, interest in ultraviolet light-induced fluorescence (UV-LIF), which can provide real-time particle measurements, has increased in the last decade. Such technological advancements, especially instruments that produce high resolution data, bring a wealth of complex data that can be difficult to analyse. Consequently, it becomes ever more important to question the algorithmic approach used to analyse the data. One of the potential solutions to overcome this obstacle is machine learning, which has seen increased interest in recent years, with a vast number of applications. As more laboratory data have also become available, it has been possible to test the applicability of a wider variety of algorithms. In this thesis, we evaluate the applicability of these algorithms to data collected using both an earlier developed instrument called the WIBS and an updated version of the WIBS called the MBS. Hierarchical agglomerative clustering is a technique that has been used by researchers in previous studies using the WIBS. In this thesis, we present the first paper, using the MBS as well as extend the current body of research that uses the WIBS, showing some previously unseen disadvantages to this technique. We demonstrate excellent discrimination between laboratory particles for the newer instrument using supervised machine learning techniques but identify critical areas where the application of such algorithms to ambient data are limited and need improvement.Primary biological aerosol particles, including bacteria, fungal spores and pollen are a crucial component of the atmosphere. These particles have a significant impact on human health, especially for those who suffer from hay fever and asthma. They can also influence the weather, acting as ice nuclei (IN) or cloud condensation nuclei (CCN) and can be impactful on agriculture through destruction of crops caused by fungal diseases. It is also vital to monitor these particles in an indoor environment: in hospitals as well as cleanrooms within the pharmaceutical manufacturing process. There are several different techniques that scientists can use to analyse bio-aerosol. Manual techniques, including tape and filter sampling, are often used by meteorological institutions to provide a pollen forecast. Alternatively, in a clean indoor environment, pharmaceutical manufacturers may use culture-based techniques to help prevent contamination. These approaches, however, can suffer from low time resolution and substantial human labour costs. As a consequence, interest in ultraviolet light-induced fluorescence (UV-LIF), which can provide real-time particle measurements, has increased in the last decade. Such technological advancements, especially instruments that produce high resolution data, bring a wealth of complex data that can be difficult to analyse. Consequently, it becomes ever more important to question the algorithmic approach used to analyse the data. One of the potential solutions to overcome this obstacle is machine learning, which has seen increased interest in recent years, with a vast number of applications. As more laboratory data have also become available, it has been possible to test the applicability of a wider variety of algorithms. In this thesis, we evaluate the applicability of these algorithms to data collected using both an earlier developed instrument called the WIBS and an updated version of the WIBS called the MBS. Hierarchical agglomerative clustering is a technique that has been used by researchers in previous studies using the WIBS. In this thesis, we present the first paper, using the MBS as well as extend the current body of research that uses the WIBS, showing some previously unseen disadvantages to this technique. We demonstrate excellent discrimination between laboratory particles for the newer instrument using supervised machine learning techniques but identify critical areas where the application of such algorithms to ambient data are limited and need improvement.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2020