CSCS organized May 15-17 a 2-day training course aimed to teach performance engineering approaches on the compute node level. "Performance engineering" is intended as developing a thorough understanding of the interactions between software and hardware. The instructors were Prof. Gerhard Wellen and Dr. Georg Hager from RRZE, Germany.

Introduction to Node-level Performance Engineering »

  • Modern computer architectures
  • Performance modeling & engineering approaches
  • Accelerators

Node topology and programming models »

  • Performance modeling
  • Thread/Process core affinity
  • The LIKWID tools

Micro-benchmarking for architectural exploration »

  • The LIKWID tools: online demos
  • Micro-benchmarking for architectural exploration
  • Understanding the memory hierarchy
  • Case study: OpenMP sparse matrix-vector multiplication

Performance modeling: The Roofline Model / Case study A 3D Jacobi smoother (part 1) »

Performance modeling: The Roofline Model / Case study A 3D Jacobi smoother (part 2)»

Understanding the memory hierarchy: Cache Mapping (part 1) »

  • Optimal utilization of parallel resources
  • Reading x86 assembly code and exploiting SIMD parallelism (part 1)

Understanding the memory hierarchy: Cache Mapping (part 2) »

  • Reading x86 assembly code and exploiting SIMD parallelism (part 2)

ccNUMA and Data Locality »

  • Performance analysis with hardware metrics
  • Online demo: likwid-perfctr

Multicore Scaling: the ECM model »

  • Performance modeling of Stencil Codes
  • Examples and case studies