EXPLORING PHOTONIC BENEA  SWITCHING FABRICS FOR FUTURE HPC AND DATACENTRES

UoM administered thesis: Phd

Abstract

Scalable photonic interconnection networks are highly desirable for both the High- Performance Computing (HPC) and the datacentre domains. Their potential energy efficiency and increased bandwidth capacity compared to networks based on electron- ics are the appeal. One of the main challenges in realising large-scale photonic inter- connection networks is the adoption of network switches that can internally employ (1) high-port-count, and (2) fast, broadband photonic switching fabrics (PSFs). These fabrics are created by composing multiple stages of individual photonic devices which, when controlled with thermal or electrical tuning, can act as network switches. Including PSFs at any level of the network still faces many obstacles, related both to photonic device design, and to control functionality for the switching fabric. This the- sis contributes to the latter. It presents a simulation-based, network-traffic driven eval- uation of PSFs, that are constructed using electrically/thermally tuned Mach-Zehnder Interferometers (MZIs), and formed using the BeneA¡ network topology. These MZIs are broadband and fast switching, as they exhibit switching behaviour in ns-time across a continuous 30nm spectral segment. The BeneA¡ topology requires the fewest MZIs, thereby reducing the PSF control complexity and increasing the photonic performance. Furthermore, the thesis enables simulating the deployment of such switching fabrics in the context of future HPC systems and datacentres. First, the thesis discusses the main concepts enabling photonic communication, as well as the state-of-the-art in PSFs, and outlines the design challenges related to photonic switching. It then describes a simulation-driven methodology for evaluating the relationship among communication traffic configuration, PSF-internal routing algorithm and pho- tonic performance for a given PSF. The methodology is evaluated by simulating two state-of-the-art PSFs selected from the literature, and comparing with their reported performance. The simulation accuracy is established against the published data (inser- tion loss within 0.05 dB, photonic crosstalk within 3 dB). The thesis then proposes the concept of “Hardware-Inspired Routing strategies” (HIRs), which are a collection of routing algorithms for the studied PSFs. They lever- age both the state-based asymmetry in device photonic performance and the path-based asymmetry offered by the switch fabric topology, to reduce photonic losses and switch- ing energy-per-bit when using Circuit Switching (CS). Depending on the communica- tion traffic configuration, the two best HIRs can be effective at reducing the photonic losses which compose the combined photonic power penalty. The power penalty de- termines the required signal power for the PSF and therefore the energy efficiency. Compared to the state-of-the-art “Looping Algorithm,” the HIRs can reduce the pho- tonic power penalty by ∼ 15 − 20% on average and by ∼ 19 − 15% in the worst case as the PSF size increases. When considering an on-chip deployment scenario, this can lead to laser power savings between ∼ 20 − 77% on average and ∼ 24 − 42% in the worst case. It then proposes augmenting the HIRs with Time-Division Multiplexing (TDM), and investigates deploying a 16×16 PSF, which is selected from the literature, within a top-of-rack switch. When using TDM, flows are partitioned into equal-sized segments, which are then interleaved by the PSF controller to reduce the timing penalty of switch fabric contention incurred by CS. The simulations show that when employing TDM, communication time within the PSF can be reduced by up to ∼ 20% compared to CS, depending on the employed workload, while not affecting insertion loss or switching energy per bit. The thesis concludes by investigating the joint impact of traffic arbitration pol- icy, PSF-internal routing algorithm and workload on the switch performance (insertion loss, communication time within the PSF, switching energy per bit). The results indi- cate that communication time is affected the most by the arbitration policy with dif- ferences generally at ∼ 10% and, in some extreme cases, over 30%. Switching energy per bit is affected less significantly, with differences around ∼ 4 − 5% (at most 15%), while insertion loss is negligibly affected. These indicate that arbitration in these PSFs could be designed independently from routing. The least-frequently used policy was found to be the best overall and particularly with regular workloads, in which tasks progress at the same pace, with clear communication phases of fixed size. In these, the communication time is reduced by the arbitration policy by ∼ 30%, while in irregular workloads the communication time is increased due to the policy by ∼ 6%. On the other hand, one of the novel policies proposed, accelerated round-robin, excels with irregular workloads; in these, tasks progress at a pace dictated by traffic causality.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
  • Mikel Luján (Supervisor)
Award date31 Dec 2022