Wednesday, 13 December 2017

Session 7: Best Industrial Paper Candidates

Test-Driving Intel Xeon Phi

Authors:

Jianbin Fang (TU Delft)
Henk Sips (TU Delft)
LiLun Zhang (NUDT)
Chuanfu Xu (NUDT)
Yonggang Che (NUDT)
Ana Lucia Varbanescu (UvA)

Abstract:

Based on Intel’s Many Integrated Core (MIC) architecture, Intel Xeon Phi is one of the few truly many-core CPUs - featuring around 60 fairly powerful cores, two levels of caches, and graphic memory, all interconnected by a very fast ring. Given its promised ease-of-use and high performance, we took Xeon Phi out for a test drive. In this paper, we present this experience at two different levels: (1) the microbenchmark level, where we stress ”each nut and bolt” of Phi in the lab, and (2) the application level, where we study Phi’s performance response in a real-life environment. At the microbenchmarking level, we show the high performance of five components of the architecture, focusing on their maximum achieved performance and the prerequisites to achieve it. Next, we choose a medical imaging application (Leukocyte Tracking) as a case study. We observed that it is rather easy to get functional code and start benchmarking, but the first performance numbers can be far from satisfying. Our experience indicates that a simple data structure and massive parallelism are critical for Xeon Phi to perform well. When compiler-driven parallelization and/or vectorization fails, programming Xeon Phi for performance can become very challenging.

DOI: 10.1145/2568088.2576799

Full text: PDF

[#][]

A Power-Measurement Methodology for Large-Scale, High-Performance Computing

Authors:

Thomas R. W. Scogland (Virginia Tech)
Craig P. Steffen (University of Illinois)
Torsten Wilde (Leibniz Supercomputing Center)
Florent Parent (Calcul Québec)
Susan Coghlan (Argonne National Laboratory)
Natalie Bates (Energy Effiicient HPC Working Group)
Wu-chun Feng (Virginia Tech)
Erich Strohmaier (Lawrence Berkeley National Laboratory)

Abstract:

Improvement in the energy effciency of supercomputers can be accelerated by improving the quality and comparability of effciency measurements. The ability to generate accurate measurements at extreme scale are just now emerging. The realization of system-level measurement capabilities can be accelerated with a commonly adopted and high quality measurement methodology for use while running a workload, typically a benchmark. This paper describes a methodology that has been developed collaboratively through the Energy Effcient HPC Working Group to support architectural analysis and comparative measurements for rankings, such as the Top500 and Green500. To support measurements with varying amounts of effort and equipment required we present three distinct levels of measurement, which provide increasing levels of accuracy. Level 1 is similar to the Green500 run rules today, a single average power measurement extrapolated from a subset of a machine. Level 2 is more comprehensive, but still widely achievable. Level 3 is the most rigorous of the three methodologies but is only possible at a few sites. However, the Level 3 methodology generates a high quality result that exposes details that the other methodologies may miss. In addition, we present case studies from the Leibniz Supercomputing Centre (LRZ), Argonne National Laboratory (ANL) and Calcul Que´bec Universite´ Laval that explore the benefits and diffculties of gathering high quality, system-level measurements on large-scale machines.

DOI: 10.1145/2568088.2576795

Full text: PDF

[#][]