43. TRANSLATING MPI APPLICATIONS TO A LATENCY-TOLERANT, DATA-DRIVEN FORM

Department: Computer Science & Engineering
Faculty Advisor(s): Scott B. Baden

Primary Student
Name: Nhat Tan Nguyen Thanh
Email: nnguyent@ucsd.edu
Phone: 858-534-9916
Grad Year: 2014

Abstract
Applications running on exascale computers will invest heavily in optimizations that reduce data motion costs, including techniques to overlap communication with computation. Since present day compiler technology cannot perform the required optimizations, the task of masking communication delays entails significant, intrusive performance programming, challenging even the expert programmer. We present Bamboo, a custom source-to-source translator that trans- forms MPI C source into a data-driven form that automatically overlaps communication with available computation. Running on up to 98304 processors of NERSC's Hopper system, we observe that Bamboo speeds up an MPI implementation of a 3D Jacobi iterative solver, by overlapping communication with available computation. Depending on the number of cores, Bamboo's generated code meets or exceeds the performance of a painstakingly optimized MPI hand-written, which includes split-phase coding, the method classically employed to hide communication. We achieved these results with only modest amounts of programmer annotation and no intrusive reprogramming of the original application source.

« Back to Posters or Search Results