Previous research suggests that emotional prosody processing is a highly rapid and complex process. In particular, it has been shown that different basic emotions can be differentiated in an early event-related brain potential (ERP) component, the P200. Often, the P200 is followed by later long lasting ERPs such as the late positive complex. The current experiment set out to explore in how far emotionality and arousal can modulate these previously reported ERP components. In addition, we also investigated the influence of task demands (implicit vs. explicit evaluation of stimuli). Participants listened to pseudo-sentences (sentences with no lexical content) spoken in six different emotions or in a neutral tone of voice while they either rated the arousal level of the speaker or their own arousal level. Results confirm that different emotional intonations can first be differentiated in the P200 component, reflecting a first emotional encoding of the stimulus possibly including a valence tagging process. A marginal significant arousal effect was also found in this time-window with high arousing stimuli eliciting a stronger P200 than low arousing stimuli. The P200 component was followed by a long lasting positive ERP between 400 and 750 ms. In this late time-window, both emotion and arousal effects were found. No effects of task were observed in either time-window. Taken together, results suggest that emotion relevant details are robustly decoded during early processing and late processing stages while arousal information is only reliably taken into consideration at a later stage of processing.