NPS-NRL-Rice-UIUC Collaboration on Navy Atmosphere-Ocean Coupled Models on Many-Core Computer Architectures
Lead PI: Dr. Lucas Wilcox, Naval Postgraduate School
Start Year: 2014 | Duration: 5 Years
Partners: Naval Postgraduate School, Rice University, NRL Stennis Space Center & University of Illinois, Urbana-Champaign
The goal of this project is threefold. The first goal is to identify the bottlenecks of the Nonhydrostatic Unified Model of the Atmosphere (NUMA) and then circumvent these bottlenecks through the use of: 1) analytical tools to identify the most computationally intensive parts of both the dynamics and physics; 2) intelligent and performance portable use of heterogeneous accelerator-based many-core machines, such as General Purpose Graphics Processing Units (GPGPU or GPU, for short) or Intel’s Many Integrated Core (MIC), for the dynamics; and 3) intelligent use of accelerators for the physics. The second goal is to implement Earth System Modeling Framework (ESMF) interfaces for the accelerator-based computational kernels of NUMA allowing the study of coupling many-core based components. We will investigate whether the ESMF data structures can be used to streamline the coupling of models in light of these new computer architectures which require memory access that has to be carefully orchestrated to maximize both cache hits and bus occupancy for out of cache requests. The third goal is to implement NUMA as an ESMF component allowing NUMA to be used as an atmospheric component in a coupled earth system application. A specific outcome of this goal will be a demonstration of a coupled air-ocean-wave-ice system involving NUMA, HYCOM, Wavewatch III, and CICE within the Navy ESPC. The understanding gained through this investigation will have a direct impact on the Navy ESPC that is currently under development. NUMA has already been shown to scale up to tens of thousands of processors on CPU-based distributed- memory platforms . This impressive scalability has been achieved through the use of the Message Passing Interface to exchange data between processors. The work proposed here will further increase the performance of NUMA especially for the most costly operations that are currently taking place on- processor. Examples of such operations include the right-hand-side (RHS) vectors formed by the continuous/discontinuous Galerkin (CG/DG) high-order spatial operators, the implicit time integration strategy, and the sub-grid scale physics.