Friday, 28 April 2017

LIKWID: Like I Knew What I'm Doing


LIKWID is a set of command line tools supporting software developers, benchmarkers and application users to get the best performance on a given system. LIKWID is available for the Linux operations system, works with any standard Linux kernel and only depends on the GNU compiler and Perl to build. Some of the tools are restricted to x86 processors. The tools can be roughly grouped in three categories: System information and control, performance and energy profiling and microbenchmarking.

In the first group you can find likwid-topology, likwid-features, likwid-pin, likwid-mpirun, likwid-setFrequencies and likwid-powermeter. likwid-topology presents all node information necessary for a software developer in a structured way. This covers core and memory hierarchy topology, NUMA topology as well as cache properties. It supports an ASCII art output giving an accessible overview about the complete node. On many Intel x86 processors certain properties of the chip can be controlled at runtime by toggling bits in MSR control registers. likwid-features gives access to these control registers on all Intel processors starting with Intel Core 2. One of the main benefits is that likwid-features allows to turn the various hardware prefetchers on and off for benchmarking or performance purposes. Full control about the mapping of threads to physical compute resources is crucial for deterministic and knowledgeable benchmarking. Testing different mappings and measuring the sensitivity of a code to those changes gives important insight how a code interacts with the hardware. likwid-pin is a simple to use wrapper tool which allows full affinity control for a variety of threading programming models (e.g. most OpenMP implementations, TBB, Cilk and pthreads). It works without any code changes and offers a powerful thread group syntax with logical pinning. This thread group syntax is consistently used in almost all LIKWID tools. As an extension to likwid-pin we also offer likwid-mpirun which support the same simple thread group syntax to pin pure MPI or hybrid OpenMP/MPI applications for certain tool-chain combinations. Another degree of freedom on current processors are the frequency settings and a feature called Turbo mode on Intel processors. For benchmarking one often wants to scan a sequence of frequency settings. With likwid-frequency it is easy to query and set frequency settings from user space with a simple command line tool. It also can enable or disable Turbo Mode on Intel processors. Turbo mode is a feature allowing chips to dynamically overclock within a certain overall power budget. To interpret performance measurements it is important to know what overclocking steps are supported. likwid-powermeter gives access to all supported Turbo Mode steps from user space.

The second group of tools within LIKWID covers performance and energy profiling. The main tool in this category is likwid-perfctr which gives access to hardware counter data on a variety of Intel and AMD processors. likwid-perfctr supports low-overhead measurements. The connection between a measurement and the source code triggering the events is achieved by built-in affinity control. The main benefit of likwid-perfctr are so-called performance groups which provide event sets and derived metrics for most areas a software developer or benchmarker is interested in in a portable way across all supported processors. Typical groups are for example MEM (main memory data volumes and bandwidth), L2 (L2 cache data volumes and bandwidth), DATA (load to store ratio) and ENERGY (power and energy consumption). likwid-perfctr supports multiple usage modes. It can be used as a wrapper with simple end- to-end measurements, use time-based sampling (timeline mode) or can be used together with an instrumentation API for fine grained measurements of code regions. It can also be used as a generic monitoring tool  measuring what currently happens on a compute node. The results are presented in ASCII tables listing raw counts and derived metrics or in RFC 4180 conform CSV for further processing. All functionality is also available as a C, Lua and (optional) Python API and can be used to implement self profiling of applications or base custom tool developments on LIKWID. The energy measurements facilitate the Intel RAPL interface available in recent Intel processors. Apart from using the ENERGY group in likwid-perfctr this data can also be measured end-to-end with likwid-powermeter.

Microbencharking a machine can be a tedious job and writing good benchmarks requires to take care of thread and memory placement as well as high resolution timing. likwid-bench is a framework which enables to ease the development of small assembly loop kernels. The decision to rely on assembly kernels is to focus on the ISA interface without another layer of abstraction and complexity added by a compiler. The developer focuses on the loop code and the framework cares for threading, data allocation and placement, timing and result presentation. Moreover, it currently ships with 89 benchmarks performing 12 different kernel operations with different processor features like AVX, SSE, FMA or non-temporal stores and therefore can also be seen as a standalone benchmarking application.

  • GNU/Linux OS
  • GCC
  • Perl
  • HPC group of the Regional Computing Center Erlangen
  • Thomas Röhl, Thomas.Roehl(at)

    Regional Computing Center Erlangen
    Martensstr. 1, 91058 Erlangen, Germany

  • Jan Eitzinger, Jan.Eitzinger(at)

    Regional Computing Center Erlangen
    Martensstr. 1, 91058 Erlangen, Germany
  • 4.2
  • GPLv3
Publications/ Projects using the tool
  • M. Gutierrez, S. Rahman, D. Tamir and A. Qasem, "Neural network methods for fast and portable prediction of CPU power consumption," Green Computing Conference and Sustainable Computing Conference (IGSC), 2015 Sixth International, Las Vegas, NV, 2015, pp. 1-4. doi: 10.1109/IGCC.2015.7393702
  • J. Treibig, G. Hager and G. Wellein. "Performance patterns and hardware metrics on modern multicore processors: Best practices for performance engineering." Euro-Par 2012: Parallel Processing Workshops. Springer Berlin Heidelberg, 2012.
  • T. Malas, G. Hager, H. Ltaief and D. Keyes, "Towards Fast Reverse Time Migration Kernels using Multi-threaded Wavefront Diamond Tiling." Second EAGE Workshop on High Performance Computing for Upstream. 2015.
  • T. Gasc, F. De Vuyst, M. Peybernes, R. Poncet, "Performance modeling & prediction of2 nd-order staggered Lagrange-Remap Hydrodynamics solvers: modeling, measurements & validation." GAMNI Mécanique des fluides numérique, IHP, Paris. doi: 10.13140/2.1.4313.6802
  • M.F. Dolz, J. Kunkel, K. Chasapis, S. Catalán, "An analytical methodology to derive power models based on hardware and software metrics." Computer Science-Research and Development (2015): 1-10.
  • A. Dzhagaryan, A. Milenković, "Impact of thread and frequency scaling on performance and energy in modern multicores: a measurement-based study." Proceedings of the 2014 ACM Southeast Regional Conference. ACM, 2014.
  • T. Roehl, J. Eitzinger, G. Hager, G. Wellein, "Overhead analysis of performance counter measurements." Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on. IEEE, 2014.
  • B. Weyers, C.Terboven, D. Schmidl, "Visualization of memory access behavior on hierarchical NUMA architectures." Visual Performance Analysis (VPA), 2014 First Workshop on Visual Performance Analysis. IEEE, 2014. doi: 10.1109/VPA.2014.12
  • Y. Liu, L. Deng, "Acceleration of CFD Engineering Software on GPU and MIC." Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2015. 835-848.
  • J. Kunkel, A. Aguilera, N. Hübbe, M. Wiedemann, M. Zimmer, "Monitoring energy consumption with SIOX." Computer Science-Research and Development 30.2 (2015): 125-133.
  • H. Baik, H. Song, "A complexity-based adaptive tile partitioning algorithm for HEVC decoder parallelization.", 2015 IEEE International Conference on Image Processing (ICIP), IEEE, 2015.
  • R. A. Shafik, A.K. Das, S. Yang, G.V. Merrett, B. Al-Hashimi "Thermal-aware adaptive energy minimization of openMP parallel applications.", DATE2015: Workshop on Designing with Uncertainty - Opportunities & Challenges in Conjunction with Design and Test in Europe (DATE) Conference, Grenoble, 2015, 1-3.