Important Links
- RG Cloud Working Group
- RG DevOps Performance Working Group
- RG Security Benchmarking Working Group
- RG Power Working Group
- RG Quality of Experience Working Group
- Latest Newsletter Issue
- ICPE International Conference
- RG Zenodo Artefact Repository
Upcoming Events
- ICPE 2021, Rennes, France
April 19-23, 2021
Session 6: Big Data & Database
A Constraint Programming Based Hadoop Scheduler for Handling MapReduce Jobs with Deadlines on Clouds
Authors:
Norman Lim (Carleton University)
Shikharesh Majumdar (Carleton University)
Peter Ashwood-Smith (Huawei, Canada)
Abstract:
A novel MapReduce constraint programming based matchmaking and scheduling algorithm (MRCP) that can handle MapReduce jobs with deadlines and achieve high system performance is devised. The MRCP algorithm is incorporated into Hadoop, which is a widely used open source implementation of the MapReduce programming model, as a new scheduler called the CP-Scheduler. This paper originates from the collaborative research with our industrial partner concerning the engineering of resource management middleware for high performance. It describes our experiences and the challenges that we encountered in designing and implementing the prototype CP-based Hadoop scheduler. A detailed performance evaluation of the CP-Scheduler is conducted on Amazon EC2 to determine the CP-Scheduler’s effectiveness as well as to obtain insights into system behaviour and performance. In addition, the CP-Scheduler’s performance is also compared with an earliest deadline first (EDF) Hadoop scheduler, which is implemented by extending Hadoop’s default FIFO scheduler. The experimental results demonstrate the effectiveness of the CP-Scheduler’s ability to handle an open stream of MapReduce jobs with deadlines in a Hadoop cluster.
DOI: 10.1145/2668930.2688058Full text: PDF
An Empirical Performance Evaluation of Distributed SQL Query Engines
Authors:
Stefan van Wouw (Azavista & Delft University of Technology)
José Viña (Azavista)
Alexandru Iosup (Delft University of Technology)
Dick Epema (Delft University of Technology)
Abstract:
Distributed SQL Query Engines (DSQEs) are increasingly used in a variety of domains, but especially users in small companies with little expertise may face the challenge of selecting an appropriate engine for their specific applications. Although both industry and academia are attempting to come up with high level benchmarks, the performance of DSQEs has never been explored or compared in-depth. We propose an empirical method for evaluating the performance of DSQEs with representative metrics, datasets, and system configurations. We implement a micro-benchmarking suite of three classes of SQL queries for both a synthetic and a real world dataset and we report response time, resource utilization, and scalability. We use our micro-benchmarking suite to analyze and compare three state-of-the-art engines, viz. Shark, Impala, and Hive. We gain valuable insights for each engine and we present a comprehensive comparison of these DSQEs. We find that different query engines have widely varying performance: Hive is always being outperformed by the other engines, but whether Impala or Shark is the best performer highly depends on the query type.
DOI: 10.1145/2668930.2688053Full text: PDF
IoTAbench: An Internet of Things Analytics Benchmark
Authors:
Martin Arlitt (HP Labs)
Manish Marwah (HP Labs)
Gowtham Bellala (HP Labs)
Amip Shah (HP Labs)
Jeff Healey (HP Vertica)
Ben Vandiver (HP Vertica)
Abstract:
The commoditization of sensors and communication networks is enabling vast quantities of data to be generated by and collected from cyber-physical systems. This “Internet-of-Things” (IoT) makes possible new business opportunities, from usage-based insurance to proactive equipment maintenance. While many technology vendors now offer “Big Data” solutions, a challenge for potential customers is understanding quantitatively how these solutions will work for IoT use cases. This paper describes a benchmark toolkit called IoTAbench for IoT Big Data scenarios. This toolset facilitates repeatable testing that can be easily extended to multiple IoT use cases, including a user’s specific needs, interests or dataset. We demonstrate the benchmark via a smart metering use case involving an eight-node cluster running the HP Vertica analytics platform. The use case involves generating, loading, repairing and analyzing synthetic meter readings. The intent of IoTAbench is to provide the means to perform “apples-to-apples” comparisons between different sensor data and analytics platforms. We illustrate the capabilities of IoTAbench via a large experimental study, where we store 22.8 trillion smart meter readings totaling 727 TB of data in our eight-node cluster.
DOI: 10.1145/2668930.2688055Full text: PDF