Saturday, 17 April 2021


Keynote Talk

Session Chair: Klaus Lange (Hewlett Packard)

Software Knows Best: Portable Parallelism Requires Standardized Measurements of Transparent Hardware


David A. Patterson (University of California, Berkeley)


The hardware trend of the last 15 years of dynamically trying to improve performance with little software visibility is not only irrelevant today, its counterproductive; adaptivity must be at the software level if parallel software is going to be portable, fast, and energy-efficient. A portable parallel program is an oxymoron today; there is no reason to be parallel if it's slow, and parallel can't be fast if it's portable. Hence, portable parallel programs of the future must be able to understand and measure /any/ computer on which it runs so that it can adapt effectively, which suggests that hardware measurement should be standardized and processor performance and energy consumption should become transparent.

In addition to software-controlled adaptivity for execution efficiency by using techniques like autotuning and dynamic scheduling, modern software environments adapt to improve /programmer/ efficiency [1]. Classic examples include dynamic linking, dynamic memory allocation, garbage collection, interpreters, just-in-time compilers, and debugger-support. Examples that are more recent are selective embedded just in time specialization (SEJITS) [2] for highly productive languages like Python and Ruby. Thus, the future of programming is likely to involve program generators at many levels of the hierarchy tailoring the application to the machine. These productivity advances via adaptivity should be reflected in modern benchmarks: virtually no one writes the statically linked, highest-level-optimized C programs that are the foundation of most benchmark suites.

The dream is to improve productivity without sacrificing too much performance. Indeed, how often have you heard the claim that a new productive environment is now "almost as fast as C" or "almost as fast as Java?" The implication of the necessary tie between productivity and performance in the manycore era is that these modern environments must be able to utilize manycore well, or the gap between highly efficient code and highly productive code will grow with the number of cores.

For industry's bet on manycore to win, therefore, both very high level and very low level programming environments will need to be able to understand and measure their underlying hardware and adapt their execution so as to be portable, relatively fast, and energy-efficient.

Hence, we argue that a standard of accurate hardware operation trackers (SHOT) would have a huge positive impact on making parallel software portable with good performance and energy efficiency, similar to the impact of the IEEE-754 standard had on portability of numerical software. In particular, we believe SHOT will lead to much larger improvements in portability, performance, energy efficiency of parallel codes than recent architectural fads like opportunistic "turbo modes," transactional memory, or reconfigurable computing.

DOI: 10.1145/1712605.1712607

Full text: Pdf


Invited Talk

Session Chair: Samuel Kounev (University of Karlsruhe)

BAP: A Benchmark-driven Algebraic Method for the Performance Engineering of Customized Services


Jerry Rolia (Hewlett Packard Laboratories)
Diwkar Krishnamurthy (University of Calgary)
Giuliano Casale (SAP Research, Belfast)
Stephen Dawson (SAP Research, Belfast)


This paper describes our joint research on performance engineering methods for services in shared resource utilities. The techniques support the automated sizing of a customized service instance and the automated creation of performance validation tests for the instance. The performance tests permit fine-grained control over inter-arrival time and service time burstiness to validate sizing and facilitate the development and validation of adaptation policies. Our novel research on sizing also takes into account the impact of workload factors that contribute to such burstiness. The methods are automated, integrated, and exploit an algebraic approach to workload modelling that relies on per-service benchmark suites with benchmarks that can be automatically executed within utilities. The benchmarks and their performance results are reused to support a Benchmark-driven Algebraic method for the Performance (BAP) engineering of customized services.

DOI: 10.1145/1712605.1712609

Full text: PDF