Tuesday, 12 December 2017

Session 2a: Performance Measurements, Analysis and Tools I

Session Chair: Kai Sachs (TU Darmstadt)

Rapid Development of Extensible Profilers for the Java Virtual Machine with Aspect-Oriented Programming

Authors:

Danilo Ansaloni (University of Lugano)
Walter Binder (University of Lugano)
Alex Villazón (University of Lugano)
Philippe Moret (University of Lugano)

Abstract:

Many profilers for Java applications are implemented with low-level bytecode instrumentation techniques, which is tedious, error-prone, and complicates maintenance and extension of the tools. In order to reduce development time and cost, we promote building Java profilers using high-level aspect-oriented programming (AOP). We show that the use of aspects yields concise profilers that are easy to develop, extend, and maintain, because low-level instrumentation details are hidden from the tool developer. Our profiler relies on inter-advice communication, an extension to common AOP languages that enables efficient data passing between advice woven into the same method. We illustrate our approach with two case studies. First, we show that an existing, instrumentation-based tool for listener latency profiling can be easily recast as an aspect. Second, we present an aspect for comprehensive calling context profiling. In order to reduce profiling overhead, our aspect parallelizes application execution and profile creation.

DOI: 10.1145/1712605.1712616

Full text: PDF

[#][]

Exploring Large Profiles with Calling Context Ring Charts

Authors:

Philippe Moret (University of Lugano)
Walter Binder (University of Lugano)
Alex Villazón (University of Lugano)
Danilo Ansaloni (University of Lugano)

Abstract:

Calling context profiling is an important technique for analyzing the performance of object-oriented software with complex inter-procedural control flow. A common data structure is the Calling Context Tree (CCT), which stores dynamic metrics, such as CPU time, separately for each calling context. As CCTs may comprise millions of nodes, there is need for a condensed visualization that eases the location of performance bottlenecks. In this paper, we discuss Calling Context Ring Charts (CCRCs), a compact visualization for CCTs, where callee methods are represented in ring segments surrounding the caller's ring segment. In order to reveal hot methods, their callers, and callees, the ring segments can be sized according to a chosen dynamic metric. We describe a case study where CCRCs help detect and fix performance problems in an application. An evaluation confirms that our implementation efficiently handles large CCTs with millions of nodes.

DOI: 10.1145/1712605.1712617

Full text: PDF

[#][]

Session 2b: Performance Measurements, Analysis and Tools II

Session Chair: Kalyan Kumaran (Argonne National Laboratory)

Analytical Modeling of Lock-based Concurrency Control with Arbitrary Transaction Data Access Patterns

Authors:

Pierangelo Di Sanzo (Università di Roma)
Roberto Palmieri (Università di Roma)
Bruno Ciciani (Università di Roma)
Francesco Quaglia (Università di Roma)
Paolo Romano (INESC-ID)

Abstract:

Nowadays the 2-Phase-Locking (2PL) concurrency control algorithm still plays a core rule in the construction of transactional systems (e.g. database systems and transactional memories). Hence, any technique allowing accurate analysis and prediction of the performance of 2PL based systems can be of wide interest and applicability. In this article we present an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results. In particular our model captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases. Also, our model provides significantly more accurate performance predictions in heavy contention scenarios, where the number of transactions enqueued due to conflicting lock requests is expected to be non-minimal. The accuracy of our model has been verified against simulation results based on both synthetic data access patterns and patterns derived from the TPC-C benchmark.

DOI: 10.1145/1712605.1712619

Full text: PDF

[#][]

MPInside: A Performance Analysis and Diagnostic Tool for MPI Applications

Authors:

Daniel Thomas (SGI)
Jean-Pierre Panziera (SGI)
John Baron (SGI)

Abstract:

Performance analysis and prediction of parallel applications using the Message-Passing Interface (MPI) standard is a challenging task. Collecting, organizing, and making sense of profiling data for MPI jobs of even modest scale is difficult and time-consuming. The task is further complicated by the inherent difficulty in interpreting the resulting communication measurements. In this paper we introduce MPInside, a new profiling and diagnostic tool that overcomes these constraints with carefully considered choices for measurement techniques, capabilities, and output formats. Using examples from real-world applications, we illustrate the innovative features of the tool--including late senders for point-to-point calls and unaligned collective calls--all in an instrumentation-free framework. We also demonstrate the in-flight modeling capabilities of MPInside with several "what if" experiments.

DOI: 10.1145/1712605.1712620

Full text: PDF

[#][]

Workload-Intensity-Sensitive Timing Behavior Analysis for Distributed Multi-User Software Systems

Authors:

Matthias Rohr (BTC Business Technology Consulting AG & University of Oldenburg)
André van Hoorn (University of Oldenburg)
Wilhelm Hasselbring (University of Oldenburg & University of Kiel)
Marco Lübcke (CeWe Color AG & Co. OHG)
Sergej Alekseev (Nokia Siemens Networks & University of Applied Sciences)

Abstract:

In many multi-user software systems, such as online shopping systems, varying workload intensity causes high statistical variance in timing behavior distributions. However, this major impact on timing behavior is often ignored. This paper introduces our approach WITiBA (Workload-Intensity-Sensitive Timing Behavior Analysis) to consider inter-dependencies between concurrent executions of software operations within a distributed system to reduce the standard deviation for succeeding analysis steps. This can be beneficial for analysis methods or simulation methods in terms of tighter confidence intervals, or shorter simulations.

DOI: 10.1145/1712605.1712621

Full text: PDF

[#][]