Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric

Research output: Contribution to journalArticlepeer-review

  • External authors:
  • M M Khan
  • Alexander Rast
  • Javier Navaridas Palma
  • X Jin
  • Mikel Lujan
  • S Temple
  • C Patterson
  • D Richards
  • J Miguel-Alonso

Abstract

Configuring a million-core parallel system at boot time is a difficult process when the system has neither specialised hardware support for the configuration process nor a preconfigured default state that puts it in operating condition. The architecture of SpiNNaker, a parallel chip multiprocessor (CMP) system for neural network simulation, is in this class. To function as a universal neural chip, SpiNNaker uses an event-driven model with complete system virtualisation so that all components are generic and identical. Where most large CMP systems feature a sideband network to complete the boot process, SpiNNaker has a single homogeneous network interconnect for both application inter-processor communications and system control functions. This network improves fault tolerance and makes it easier to support dynamic run-time reconfiguration, however, it requires a boot process compatible with the application's communications model. Here, we present such a boot loader, capable of bringing a generic, initially unconfigured parallel system into a working configuration. Since SpiNNaker uses event-driven asynchronous communications throughout, the loader operates with purely local control: there is no global synchronisation, state information, or transition sequence. A novel two-stage "unfolding" boot-up process efficiently configures the SpiNNaker hardware and loads the application using a high-speed flood-fill technique with support for run-time reconfiguration. SystemC simulation of a multi-CMP SpiNNaker system indicates an error-free CMP configuration time of ∼1.37 ms, while a high-level simulation of a full-scale system (64 K CMPs) indicates a mean application-loading time of ∼20 ms (for a 100 KB application), which is virtually independent of the size of the system. Further hardware-level Verilog simulation verified the cycle-accurate functionality of CMP configuration. The complete process illustrates a useful method for configuring large-scale event-driven parallel systems without having to provide dedicated hardware boot support or rely on system state assumptions. © 2010 Elsevier B.V. All rights reserved.

Bibliographical metadata

Original languageEnglish
Pages (from-to)392-409
Number of pages18
JournalParallel Computing
Volume37
Issue number8
DOIs
Publication statusPublished - Aug 2011