Thursday, 14 December 2017

Works-in-Progress

Work-In-Progress Chairs' Welcome Message

Authors:

David J. Lilja (University of Minnesota)
Raffaela Mirandola (Politecnico di Milano)

Full text: PDF

[#][]

In search for contention-descriptive metrics in HPC cluster environment

Authors:

Sergey Blagodurov (Simon Fraser University)
Alexandra Fedorova (Simon Fraser University)

Abstract:

In this paper, we argue that the modern HPC cluster environments contain several bottlenecks both within cluster multicore nodes and between them in the cluster interconnects. These bottlenecks represent resources that can be of high demand to several jobs, concurrently executing on the cluster. As such, the jobs can compete for accessing these resources and experience performance degradation due to contention. We point out, that, although the contention for shared resources like memory hierarchy of the cluster nodes, accessing the cluster interconnects or sharing the floating point unit can incur severe performance degradation to the cluster workload, the state-of-the-art cluster schedulers do not contain adequate means of addressing it. To fill this gap, we propose a new set of metrics that models shared resource contention and represents a fine-grained information about each job's resource utilization and communication patterns. The necessary information can be obtained with the performance counters within cluster nodes and cluster interconnect monitoring between them.

DOI: 10.1145/1958746.1958815

Full text: PDF

[#][]

Automatic Performance Model Synthesis From Hardware Verification Models

Authors:

Robert H. Bell Jr. (IBM)
Matyas Sustik (IBM)
David W. Cummings (IBM)
Jonathan R. Jackson (IBM)

Abstract:

Performance models are typically written by hand for a new model or assembled piece-meal from the prior simulation code of an old model. In either case, many man-months of work may be required to write the new model and validate design details against a prior or current design. In reality, the majority of information about the performance of the design already exists in the design structure of either the old hardware model or the new model or both. To harvest this information and eliminate the significant duplicate coding and validation efforts, we propose that a performance model be automatically synthesized from a prior or current hardware design using a bottom-up, design-oriented approach. We demarcate the performance-critical boundaries of the design and perform backward-trace cone analysis to identify logic to include in the performance model.

We then abstract specific components for design changes and expend modeling effort only on the few functions relevant to a particular design study. Engineering effort then becomes focused on workload selection and quality, defining and projecting new designs, and assessing design tradeoffs and sensitivities - the small set of tasks with the highest potential to improve design performance. We present a case-study that shows that even the simplest proposed transformations on a high-performance IBM L2 cache design result in a simulation speedup of 3.9, with evidence that an order of magnitude speedup can be obtained using a few additional modeling abstractions.

DOI: 10.1145/1958746.1958816

Full text: PDF

[#][]

Engineering Ssl-Based Systems for Enhancing System Performance

Authors:

Norman Lim (Carleton University)
Shikharesh Majumdar (Carleton University)
Vineet Srivastava (Cistech Limited)

Abstract:

Security in a distributed system often comes at the cost of a performance penalty. Due to the CPU time consuming security algorithms used, transferring data using SSL is known to be significantly slow. This paper presents an initial set of research results of a university-industry collaborative research focusing on a performance enhancement technique called security sieve that separates the classified and non-classified components in a document and sends these on a secure and a (faster) non-secure channel respectively. Experimental results presented in the paper demonstrate the effectiveness of the technique.

DOI: 10.1145/1958746.1958817

Full text: PDF

[#][]

Performance Modeling of Distributed Collaboration Services

Authors:

Toqeer Israr (University of Ottawa)
Gregor v. Bochmann (University of Ottawa)

Abstract:

This paper deals with performance modeling of distributed applications, service compositions and workflow systems. From the functional perspective, the distributed application is modeled as a collaboration involving several roles, and its behavior is defined in terms of a composition from several sub-collaborations using the standard sequencing operators found in UML Activity Diagrams and similar formalisms. From the performance perspective, each collaboration is characterized by a certain number of independent input events and dependent output events, and the performance of the collaboration is defined by the minimum delays that apply for a given output event in respect to each input event on which it depends. We use a partial order to model these delays. The paper explains how these minimum delays can be measured through testing. It also provides general formulas by which the performance of a composed collaboration can be calculated from the performance of its constituent subcollaborations and the control structure which determines the order of execution of these sub-collaborations. Proofs of correctness for these formulas are given and a simple example is discussed throughout the paper.

DOI: 10.1145/1958746.1958818

Full text: PDF

[#][]

On-Line Analysis of Hardware Performance Events for Workload Characterization and Processor Frequency Scaling Decisions

Authors:

Robert Schöne (Technische Universität Dresden, Center for Information Services and HPC (ZIH))
Daniel Hackenberg (Technische Universität Dresden, Center for Information Services and HPC (ZIH))

Abstract:

Energy efficiency optimizations of computational resources continue to be of growing importance for both classical datacenter workloads as well as high performance computing environments. New hardware generations introduce more and more energy efficiency features, resulting in a power consumption variation by at least a factor of four between idle and full load. Even the power consumption of different full-load workloads can vary substantially, clearly showing that there is energy saving potential apart from the traditional "race to idle". In this paper we present a configurable CPU frequency governor that adapts processor frequencies based on performance counter measurements instead of processor load. We use the SPEC OMP benchmark suite to determine the potential of our approach and present governor configurations for two up-to-date x86_64 microarchitectures. Moreover we show that substantial follow-up work is required to assess further efficiency optimization potential in this field.

DOI: 10.1145/1958746.1958819

Full text: PDF

[#][]

Nat/Firewall Traversal Cost Model for Publish-Subscribe Systems

Authors:

Debmalya Biswas (SAP Business Objects)
Florian Kerschbaum (SAP Research)

Abstract:

We consider large scale Publish/Subscribe systems deployed across multiple organizations. However, such cross organizational deployment is often hindered by firewalls and Network Address Translators (NATs). Several workarounds have been proposed to allow firewall and NAT traversal, e.g. VPN, connection reversal, relay routers. However, each traversal mechanism in turn leads to trade-offs with respect to implementation complexity, infrastructure overhead, latency, etc. We focus on the latency aspect in this work. We propose a cost-performance model that allows quantitative evaluation of the performance latency induced by the different firewall traversal mechanisms. The utility of the model is that for a given network configuration, it is able to provide a (close) approximation of the performance latencies based on simulation results, without actually having to deploy them in practice. This also allows selecting the best traversal mechanism for a given configuration. Finally, experimental results are given to show the validity of the proposed model.

DOI: 10.1145/1958746.1958820

Full text: PDF

[#][]

Combined profiling: practical collection of feedback information for code optimization

Authors:

Paul Berubes (University of Alberta)
Adam Preuss (University of Alberta)
Jose Nelson Amarals (University of Alberta)

Abstract:

Feedback-directed optimization (FDO) depends on pro ling information that is representative of a typical execution of a given application. For most applications of interest, multiple data inputs need to be used to characterize the typical behavior of the program. Thus, pro ling information from multiple runs of the program needs to be combined. We are working on a new methodology to produce statistically sound combined pro les from multiple runs of a program. This paper presents the motivation for combined pro ling (CP), the requirements for a practical and useful methodology to combine pro les, and introduces the principal ideas under development for the creation of this methodology. We are currently working on implementations of CP in both the LLVM compiler and the IBM XL suite of compilers.

DOI: 10.1145/1958746.1958821

Full text: PDF

[#][]

Towards Studying the Performance Effects of Design Patterns for Service Oriented Architecture

Authors:

Nariman Mani (Carleton University)
Dorina C. Petriu (Carleton University)
Murray Woodside (Carleton University)

Abstract:

Patterns employed for the development of a service oriented system may affect its non-functional properties, including performance. Service Oriented Architecture (SOA) design patterns provide generic solutions for many architectural, design and implementation problems, and any pattern may have an impact on performance, either positive or negative. This research considers how to characterize the performance impact of a SOA design pattern, which includes characterizing some aspects of the design and usage environment as a whole (for example, the scale of the workload and the availability of concurrent platforms for the eventual deployment). The approach uses performance models to characterize the application and the impact of the pattern on it.

The planned approach exploits the context of model driven engineering (MDE) to give rapid feedback to developers about the potential impact of a pattern. Model transformations are used to generate the performance model, and to propagate the effect of applying a SOA design pattern to the performance model. The approach is sketched here with a preliminary case study, demonstrating its feasibility.

DOI: 10.1145/1958746.1958822

Full text: PDF

[#][]

Using Observation Ageing to Improve Markovian Model Learning in Qos Engineering

Authors:

Radu Calinescu (Aston University)
Kenneth Johnson (Aston University)
Yasmin Rafiq (Aston University)

Abstract:

This paper describes our joint research on performance engineering methods for services in shared resource utilities. The techniques support the automated sizing of a customized service instance and the automated creation of performance validation tests for the instance. The performance tests permit fine-grained control over inter-arrival time and service time burstiness to validate sizing and facilitate the development and validation of adaptation policies. Our novel research on sizing also takes into account the impact of workload factors that contribute to such burstiness. The methods are automated, integrated, and exploit an algebraic approach to workload modelling that relies on per-service benchmark suites with benchmarks that can be automatically executed within utilities. The benchmarks and their performance results are reused to support a Benchmark-driven Algebraic method for the Performance (BAP) engineering of customized services.

DOI: 10.1145/1712605.1712609

Full text: PDF

[#][]