Wednesday, 13 December 2017

Award Winners 2011

Workload Generation for Microprocessor Performance Evaluation

by Luk Van Ertvelde

Press Release
Extended Abstract

Although the availability of standardized benchmark suites has streamlined the process of performance evaluation, computer architects and engineers still face several important benchmarking challenges:

  1. Benchmarks should be representative of the applications that are expected to run on the target computer system; however, it is not always possible to compose a set of representative benchmarks. The main reason is that standardized benchmark suites are typically derived from open-source programs – because industry hesitates to share proprietary applications – which may not be representative of the real-world applications of interest.
  2. The amount of redundancy within and across benchmarks should be as small as possible. However, contemporary benchmark suites execute trillions of instructions to stress microprocessors in a meaningful way. As a result, it is infeasible to simulate entire benchmark suites using detailed cycle-accurate simulators.
  3. Benchmarks should enable micro-architecture, architecture and compiler research and development. Although existing benchmark suites satisfy this requirement, this is often not the case for benchmark reduction techniques because they typically operate at the assembly-level.

In this dissertation, I propose three novel benchmark generation and reduction techniques to address these challenges:

  1. Code mutation [1] is a novel methodology that mutates a proprietary application to complicate reverse-engineering so that it can be distributed as a benchmark among industry and academia. These benchmark mutants hide the functional semantics of proprietary applications, while exhibiting similar performance characteristics. Consequently, they can be used as proxies for proprietary software to help drive performance evaluation by third parties.
  2. Code mutation conceals the intellectual property of an application, but it does not lend itself to the generation of short-running benchmarks. Sampled simulation on the other hand reduces the simulation time of a benchmark by only simulating a small sample from a complete benchmark execution in a detailed manner. In sampled simulation, the performance bottleneck is the establishment of the micro-architectural state (particularly the state of the caches) at the beginning of each sampling unit, often referred to as the cold-start problem. I address this problem by proposing a new cache warm-up methodology, namely NSL-BLRL [2, 3], which reduces sampled simulation time by an order of magnitude compared to the state-of-the-art.
  3. Although code mutation can be used in combination with sampled simulation to generate short- running workloads that can be distributed to third parties without revealing intellectual property, the limitation is that this approach operates at assembly-level. This excludes them from being used for architecture and compiler research. We therefore propose a novel benchmark synthesis methodology and framework [4, 5] that aims at generating small but representative benchmarks in a high-level programming language, so that they can be used to explore both the architecture and compiler spaces.

[1] Luk Van Ertvelde and Lieven Eeckhout, “Dispersing Proprietary Applications as Benchmarks through Code Mutation”, In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008, 201-210
[2] Luk Van Ertvelde, Filip Hellebaut, Lieven Eeckhout and Koen De Bosschere, “NSL-BLRL: Efficient Cache Warmup for Sampled Processor Simulation”, In Proceedings of the Annual Simulation Symposium (ANSS), 2006, 87-96
[3] Luk Van Ertvelde, Filip Hellebaut and Lieven Eeckhout, “Accurate and Efficient Cache Warmup for Sampled Processor Simulation through NSL-BLRL”, In The Computer Journal, Vol. 51, No. 2, 192-206, March 2008
[4] Luk Van Ertvelde and Lieven Eeckhout, “Benchmark Synthesis for Architecture and Compiler Exploration”, In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2010, 106-116
[5] Luk Van Ertvelde and Lieven Eeckhout, “Workload Reduction and Generation Techniques”, In IEEE Micro, Nov/Dec 2010, Vol.30, No.6

Full Text

Hosted by SPEC RG

Performance Modeling and Benchmarking of Event-Based Systems

by Kai Sachs

Press Release

Event-based systems (EBS) are increasingly used as underlying technology in many mission critical areas and large-scale environments, such as environmental monitoring and location-based services. Moreover, novel event-based applications are typically highly distributed and data intensive with stringent requirements for performance and scalability. Common approaches to address these requirements are benchmarking and performance modeling. However, there was a lack of general performance modeling methodologies for EBS as well as test harnesses and benchmarks using representative workloads for EBS. Therefore, this thesis focused on approaches to benchmark EBS as well as the development of a performance modeling methodology. In this context, novel extensions for queueing Petri nets (QPNs) were proposed. The motivation was to support the development and maintenance of EBS that meet certain Quality-of-Service (QoS) requirements.

To address the lack of representative workloads we developed the first industry standard benchmark for EBS jointly with the Standard Performance Evaluation Corporation (SPEC) in whose development and specification the author was involved as a chief benchmark architect and lead developer. Our efforts resulted in the SPECjms2007 standard benchmark. Its main contributions were twofold: based on the feedback of industrial partners, we specified a comprehensive standardized workload with different scaling options and implemented the benchmark using a newly developed complex and flexible framework. Using the SPECjms2007 benchmark we introduced a methodology for performance evaluation of message-oriented middleware platforms and showed how the workload can be tailored to evaluate selected performance aspects. The standardized workload can be applied to other EBS. E.g., we developed an innovative research benchmark for publish/subscribe-based communication named jms2009-PS based on the SPECjms2007 workload. The proposed benchmarks are now the de facto standard benchmarks for evaluating messaging platforms and have already been used successfully by several industrial and research organizations as a basis for further research on performance analysis of EBS.

To describe workload properties and routing behavior we introduced a novel formal definition of EBS and their performance aspects. Furthermore, we proposed an innovative approach to characterize the workload and to model the performance aspects of EBS. We used operational analysis techniques to describe the system traffic and derived an approximation for the mean event delivery latency. We showed how more detailed performance models based on QPNs could be built and used to provide more accurate performance prediction. It is the first general performance modeling methodology for EBS and can be used for an in-depth performance analysis as well as to identify potential bottlenecks. A further contribution is a novel terminology for performance modeling patterns targeting common aspects of event-based applications using QPNs.

To improve the modeling power of QPNs, we defined several extensions of the standard QPNs. They allow us to build models in a more flexible and general way and address several limitations of QPNs. By introducing an additional level of abstraction, it is possible to distinguish between logical and physical layers in models. This enables to flexibly map logical to physical resources and thus makes it easy to customize the model to a specific deployment. Furthermore, we addressed two limiting aspects of standard QPNs: constant cardinalities and lack of transition priorities.

Finally, we validated our methodology to model EBS in two case studies and predicted system behavior and performance under load successfully. As part of the first case study we extended SIENA, a well-known distributed EBS, with a runtime measurement framework and predicted the runtime behavior including delivery latency for a basic workload. In the second case study, we developed a comprehensive model of the complete SPECjms2007 workload. To model the workload we applied our performance modeling patterns as well as our QPN extensions. We considered a number of different scenarios with varying workload intensity (up to 4,500 transaction / 30,000 messages per second) and compared the model predictions against measurements. The results demonstrated the effectiveness and practicality of the proposed modeling and prediction methodology in the context of a real-world scenario.

This thesis opens up new avenues of frontier research in the area of event-based systems. Our performance modeling methodology can be used to build self-adaptive EBS using automatic model extraction techniques. Such systems could dynamically adjust their configuration to ensure that QoS requirements are continuously met.

Full Text

Hosted by TU Darmstadt