Wednesday, 13 December 2017

Session 6: Performance Analysis and Benchmarking I

DataMill: Rigorous Performance Evaluation Made Easy

Authors:

Augusto Born de Oliveira (University of Waterloo)
Jean-Christophe Petkovich (University of Waterloo)
Thomas Reidemeister (University of Waterloo)
Sebastian Fischmeister (University of Waterloo)

Abstract:

Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with the given time pressure of producing results. So how can one trust any reported empirical analysis of a new idea or concept in computer science?

This paper introduces DataMill, a community-based easy-to-use services-oriented open benchmarking infrastructure for performance evaluation. DataMill facilitates producing robust, reliable, and reproducible results. The infrastructure incorporates the latest results on hidden factors and automates the variation of these factors. Multiple research groups already participate in DataMill.

DataMill is also of interest for research on performance evaluation. The infrastructure supports quantifying the effect of hidden factors, disseminating the research results beyond mere reporting. It provides a platform for investigating interactions and composition of hidden factors.

DOI: 10.1145/2479871.2479892

Full text: PDF

[#][]

Workload Resampling for Performance Evaluation of Parallel Job Schedulers

Authors:

Netanel Zakay (The Hebrew University of Jerusalem)
Dror G. Feitelson (The Hebrew University of Jerusalem)

Abstract:

Evaluating the performance of a computer system is based on using representative workloads. Common practice is to either use real workload traces to drive simulations, or else to use statistical workload models that are based on such traces. Such models allow various workload attributes to be manipulated, thus providing desirable flexibility, but may lose details of the workload’s internal structure. To overcome this, we suggest to combine the benefits of real traces and flexible modeling. Focusing on the problem of evaluating the performance of parallel job schedulers, we partition each trace into independent subtraces representing different users, and then re-combine them in various ways, while maintaining features like the daily and weekly cycles of activity. This facilitates the creation of longer workload traces that enable longer simulations, the creation of multiple statistically similar workloads that can be used to gauge confidence intervals, and the creation of workloads with different load levels.

DOI: 10.1145/2479871.2479893

Full text: PDF

[#][]

Improving the Scalability of a Multi-core Web Server

Authors:

Raoufehsadat Hashemian (University of Calgary)
Diwakar Krishnamurthy (University of Calgary)
Martin Arlitt (HP Labs)
Niklas Carlsson (Linköping University)

Abstract:

Improving the performance and scalability of Web servers enhances user experiences and reduces the costs of providing Web-based services. The advent of Multi-core technology motivates new studies to understand how efficiently Web servers utilize such hardware. This paper presents a detailed performance study of a Web server application deployed on a modern 2 socket, 4-cores per socket server. Our study show that default, “out-of-the-box” Web server configurations can cause the system to scale poorly with increasing core counts. We study two different types of workloads, namely a workload that imposes intense TCP/IP related OS activity and the SPECweb2009 Support workload, which incurs more application-level processing. We observe that the scaling behaviour is markedly different for these two types of workloads, mainly due to the difference in the performance characteristics of static and dynamic requests. The results of our experiments reveal that with workload-specific Web server configuration strategies a modern Multi-core server can be utilized up to 80% while still serving requests without significant queuing delays; utilizations beyond 90% are also possible, while still serving requests with acceptable response times.

DOI: 10.1145/2479871.2479894

Full text: PDF

[#][]

Modeling Performance of a Parallel Streaming Engine: Bridging Theory and Costs

Authors:

Ivan Bedini (Alcatel-Lucent Bell Labs)
Sherif Sakr (Alcatel-Lucent Bell Labs)
Bart Theeten (Alcatel-Lucent Bell Labs)
Alessandra Sala (Alcatel-Lucent Bell Labs)
Peter Cogan (Alcatel-Lucent Bell Labs)

Abstract:

While data are growing at a speed never seen before, parallel computing is becoming more and more essential to process this massive volume of data in a timely manner. Therefore, recently, concurrent computations have been receiving increasing attention due to the widespread adoption of multi-core processors and the emerging advancements of cloud computing technology. The ubiquity of mobile devices, location services, and sensor pervasiveness are examples of new scenarios that have created the crucial need for building scalable computing platforms and parallel architectures to process vast amounts of generated streaming data. In practice, efficiently operating these systems is hard due to the intrinsic complexity of these architectures and the lack of a formal and in-depth knowledge of the performance models and the consequent system costs. The Actor Model theory has been presented as a mathematical model of concurrent computation that had enormous success in practice and inspired a number of contemporary work in this area. Recently, the Storm system has been presented as a realization of the principles of the Actor Model theory in the context of the large scale processing of streaming data. In this paper, we present, to the best of our knowledge, the first set of models that formalize the performance characteristics of a practical distributed, parallel and fault-tolerant stream processing system that follows the Actor Model theory. In particular, we model the characteristics of the data flow, the data processing and the system management costs at a fine granularity within the different steps of executing a distributed stream processing job. Finally, we present an experimental validation of the described performance models using the Storm system.

DOI: 10.1145/2479871.2479895

Full text: PDF

[#][]