On Wednesday, April 24th at 10 AM EST, PICS will host a colloquium with Eric Cyr, Principal Member of the Technical Staff at the Computer Science Research Institute at Sandia National Laboratories. This colloquium will be held in-person in PICS 534 with refreshments provided.

### Title

Exploiting time-domain parallelism to accelerate neural network training and PDE constrained optimization

### Speaker

Principal Member of the Technical Staff at the Computer Science Research Institute at Sandia National Laboratories

### Abstract

This talk will explore methods for accelerating numerical optimization constrained by transient problems using parallelism. Two types of transient problems will be considered. In the first case training algorithms for Neural ODEs will be discussed. Neural ODEs are a class of neural network architecture where the depth of the neural network (the layers) is modeled as a continuous time domain. For the second case, transient PDE-constrained optimization problems will be described. In either case, simulation-based optimization requires repeated executions of the simulator’s forward and backward (adjoint) time integration schemes. Consequently, the arrow of time creates a major sequential bottleneck in the optimization process. Second, for performance these methods rely strongly on the available parallelization for the forward and adjoint solves. Thus, when forward and adjoint solvers are already operating at the limit of strong scaling and hardware utilization, the arrow-of-time bottleneck cannot be overcome by additional parallelization across the spatial grid or network layers.

Deep neural networks are a powerful machine learning tool with the capacity to‚ learn complex nonlinear relationships described by large data sets. Despite their success training these models remains a challenging and computationally intensive undertaking. We will present a layer-parallel training algorithm that exploits a multigrid scheme to accelerate both forward and backward propagation. Introducing a parallel decomposition between layers requires inexact propagation of the neural network. The multigrid method used in this approach stitches these subdomains together with sufficient accuracy to ensure rapid convergence. We demonstrate an order of magnitude wall-clock time speedup over the serial approach, opening a new avenue for parallelism that is complementary to existing approaches. We also discuss applying the layer-parallel methodology to recurrent neural networks and transformer architectures.

The second half of this talk focuses on PDE-constrained optimization formulations. Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new 2-level domain decomposition preconditioner to solve these linear systems when constrained by the heat equation. Our approach leverages the observation that the Schur-complement is elliptic in time, and thus amenable to classical domain decomposition methods. Further, the application of the preconditioner uses existing time integration routines to facilitate implementation and maximize software reuse. The performance of the preconditioner is examined in an empirical study demonstrating the approach is scalable with respect to the number of time steps and subdomains.

### Bio

Eric Cyr is a Principal Member of the Technical Staff at the Computer Science Research Institute at Sandia National Laboratories in Albuquerque, NM. Eric’s research focus is on simulation of multiphysics PDEs, advanced analysis techniques, scalable algorithms and scientific machine learning.