Understanding verbs typically activates posterior temporal regions and, in some circumstances, motion perception area V5. However, the nature and role of this activation remains unclear: does language alone indeed activate V5? And are posterior temporal representations modality-specific motion representations, or supra-modal motion-independent event representations? Here, we address these issues by investigating human and object motion sentences compared to corresponding state descriptions. We adopted the blank screen paradigm, which is known to encourage visual imagery, and used a localizer to identify V5 and temporal structures responding to motion. Analyses in each individual brain suggested that language modulated activity in the posterior temporal lobe but not within V5 in most participants. Moreover, posterior temporal structures strongly responded to both motion sentences and human static sentences. These results suggest that descriptive language alone need not recruit V5 and instead engages more schematic event representations in temporal cortex encoding animacy and motion. © 2013.