Learning Hierarchical Speech Representations Using Deep Convolutional Neural Networks

UoM administered thesis: Master of Philosophy

  • Authors:
  • Darren Hau


Deep learning has proven to be an effective methodology in handling complex AI problems, especially for visual perception tasks. Key to the success of deep learning is its ability to learn hierarchical feature representations of increasing levels of abstraction. Motivated by the success of deep learning in the visual domain, researchers have recently begun to apply deep learning to speech. In this study, we are interested in investigating the feasibility of using deep convolutional neural networks (CNN) in the speech domain. CNNs were designed based on models of the visual system and have been shown to learn hierarchical feature representations on vision tasks. As many vision tasks have an auditory analogue, we believe deep CNNs could learn an effective hierarchical representation for speech. In the speech domain, most deep architectures have used a Restricted Boltzmann Machine (RBM) based deep architecture. A secondary aim of this study is to determine whether or not a different building block can be used effectively in the speech domain. We construct a deep architecture using the CNN as the building block trained using unsupervised learning only. We compare our work against a Convolutional RBM based model on various speech perception tasks showing that it is indeed possible to use an alternative to the RBM in the speech domain. Our analysis also leads to some non-trivial observations on the suitability of using CNN-based deep architectures in the speech domain.


Original languageEnglish
Awarding Institution
Award date1 Aug 2014