Tuesday, 12 December 2017

Session 9: Measurements and Benchmarks - Part 2

Workload Characterization of Cryptography Algorithms for Hardware Acceleration

Authors:

Jed Kao-Tung Chang (University of California, Irvine)
Chen Liu (Florida International University)
Shaoshan Liu (Microsoft Corp.)
Jean-Luc Gaudiot (University of California, Irvine)

Abstract:

Data encryption/decryption has become an essential component for modern information exchange. However, executing these cryptographic algorithms is often associated with huge overhead and the need to reduce this overhead arises correspondingly. In this paper, we select nine widely adopted cryptography algorithms and study their workload characteristics. Different from many previous works, we consider the overhead not only from the perspective of computation but also focusing on the memory access pattern. We break down the function execution time to identify the software bottleneck suitable for hardware acceleration. Then we categorize the operations needed by these algorithms. In particular, we introduce a concept called 'Load-Store Block' (LSB) and perform LSB identification of various algorithms. Our results illustrate that for cryptographic algorithms, the execution rate of most hotspot functions is more than 60%; memory access instruction ratio is mostly more than 60%; and LSB instructions account for more than 30% for selected benchmarks. Based on our findings, we suggest future directions in designing either the hardware accelerator associated with microprocessor or specific microprocessor for cryptography applications.

DOI: 10.1145/1958746.1958800

Full text: PDF

[#][]

Characterization, Monitoring and Evaluation of Operational Performance Trends on Server Processor Hardware

Authors:

Ernest Sithole (University of Ulster)
Sally McClean (University of Ulster)
Bryan Scotney (University of Ulster)
Gerard Parr (University of Ulster)
Adrian Moore (University of Ulster)
Stephen Dawson (SAP Research)

Abstract:

Enterprise IT environments have seen a sharp growth in content use due to the popularity of on-demand data-intensive applications. In turn, the huge demand in content has spawned off major developments such as growth and distribution of computing nodes as well as the adoption of various implementation technologies. Given the complexity brought to the makeup of business computing environments in addressing the above-mentioned factors, the critical planning task of determining the appropriate infrastructure sizes for supporting firm Quality of Service (QoS) guarantees becomes a very challenging undertaking to fulfil. Benchmarking methods are widely employed in calibrating attainable performance in IT solutions, but these have the drawback of presenting output performance metrics as composite measurements that only give an end-to-end perspective. As an enhancement to benchmarking approaches, we explore the use of Performance Monitoring Counters (PMCs) in obtaining detailed operational performance of CPU and memory hardware. Performance Monitoring Counters (PMCs) are onchip registers found on most modern processor hardware. We use PMC-derived measurements to validate cache performance trends that have been derived analytically, and in the course of validations, PMC data is also used to investigate the nature and character of surges in cache miss events, which emerge as the memory load generated by runtime processes increases.

DOI: 10.1145/1958746.1958801

Full text: PDF

[#][

Instrumentation-based Tool for Latency Measurements

Authors:

Pekka Pääkkönen (VTT Technical Research Centre of Finland)
Jarmo Prokkola (VTT Technical Research Centre of Finland)
Ali Lattunen (VTT Technical Research Centre of Finland)

Abstract:

Software has to be tested from functional and performance viewpoints in order to create products, which fulfill customer demands. The need for testing has led to the development of a plethora of testing tools. Performance measurement of SW latencies on local and distributed SW platforms hasn't yet been completely solved, which is the research problem of this paper. In particular, GPS-based time synchronization and performance of the proof-of-concept has been concentrated on. The approach is to instrument the SW implementation under study, and to collect measurement data with the presented tool. The results indicate a resolution of 590 ns, which can be achieved with high performance reference clocks. CPU processing can be kept lower than 5% even with a high event transmission rate. In addition, the presented GPS synchronization method can be used for other purposes such as data packet time-stamping in network monitoring solutions.

DOI: 10.1145/1958746.1958802

Full text: PDF

[#][]