In naturalistic environments, emotion perception depends upon the integration of dynamic facial, body, and vocal expressions. Previous research suggests that emotional expressions are integrated more efficiently than neutral expressions. One possible mechanism facilitating multisensory integration of emotional expressions is cross-modal prediction, which can relate to the formal, spatial, or temporal structure of events. The aim of this thesis is to examine the role of temporal prediction in multisensory emotion perception. More specifically, the experiments presented test the hypothesis that the temporal information provided by emotional facial and body expressions facilitates the integration of subsequent vocal expressions, and explore the factors such as attention and presentation mode that may influence this effect. To test this hypothesis, five experiments were conducted in which participants were presented with audiovisual clips emotional (anger and fear) and neutral facial, body, and vocal expressions. In each experiment, the timing of the vocal expression was manipulated such that it could occur early, on-time, or late with respect to the natural sound onset. Of these five experiments, two behavioural studies were used to determine temporal sensitivity to asynchronies in audiovisual emotional expressions. Based on the results of these experiments, three electroencephalography (EEG) experiments were designed to investigate the role of temporal prediction in multisensory emotion perception when attention was directed to the emotion (Chapter 4), synchrony (Chapter 5), and interjection (Chapter 6) of the audiovisual clip. Moreover, in the final experiment, a mixed design was used to explore whether presentation mode modulates the effect of temporal prediction on multisensory emotion perception. In each EEG experiment, both event-related potential (ERP) and time-frequency analyses were used to determine the effects of temporal prediction on neural indices of multisensory integration (ERP) and the role of oscillatory activity in the generation of multisensory temporal predictions and prediction errors. This complementary approach provided a more comprehensive perspective on the neural dynamics underpinning multisensory emotion perception. The results of these studies yielded several novel findings that provide a basis for future research investigating the role of temporal prediction in multisensory emotion perception. In Chapter 4, results from a temporal order judgment task show that individuals can reliably detect asynchronies of +/- 360 milliseconds (ms) in dynamic facial, body, and vocal expressions, but that this threshold may be smaller (indicating higher temporal sensitivity) for emotional compared to neutral stimuli. Results of the EEG experiment that follows shows that temporal prediction may facilitate the integration of fearful expressions, as reflected by modulation of the auditory N1 ERP component and pre-stimulus beta band activity. In Chapter 5, the results of a synchrony judgment task suggest that emotion improves temporal sensitivity for auditory-leading (early) asynchronies, and that this detection of auditory-leading asynchronies is accompanied by a decrease in alpha power for early compared to on-time and late conditions. In terms of ERPs, main effects of emotion and temporal prediction were observed, but no interaction. Finally, Chapter 6 showed an effect of emotion in the N1 time window and an effect of timing in the P2 time window but no significant interaction and no significant effects in the time-frequency domain. Collectively, the findings from these studies suggest that temporal prediction facilitates the integration of fearful expressions, but only when attention is directed to the emotional quality of the stimulus.