Event Detail


Sorry, the registration period for this event is over.

2011: GPU Programming Workshop


CSCS is pleased to announce a 2 and half-day intensive course focused on GPU programming. Senior members of the PGI compiler development team, Michael Wolfe and Dave Norton, will be conducting the two full days tutorial and will provide hands-on training. The final half day will be an optional hands-on session on accelerating classical molecular dynamics simulations using GPUs, presented by CSCS

Registration deadline: March 21, 2011


Michael Wolfe and Dave Norton from the PGI compiler development team. 

VenueCSCS, Manno. Please visit our website to find out how to reach CSCS: www.cscs.ch/233.0.html

9:00 - 17:00 first two days

9:00-13:00 last day (hands-on section MD code)


CSCS visualization and GPGPU development cluster Eiger (http://www.cscs.ch/505.0.html) will be available for the hands-on training. Participants are expected to bring a laptop for hands-on training.

Maximum number of participants28



Course Syllabus

1. CPU Architecture vs. GPU Architecture
CPU Architecture BasicsMulticore and multiprocessor basics
 GPU Architecture basics
 How is the GPU connected to the host?
 Why is parallel programming for GPUs different than for multicore?
 What is a GPU thread and how does it execute?
 How can I identify my GPU?

Part II. CUDA C and Fortran

1. Low-level GPU Programming and CUDA
How does data get to the GPU?                                                                                          How does a program run on the GPU?
 What kinds of parallelism is appropriate for a GPU?
 The CUDA programming model
 Host code to control GPU, allocate memory, launch kernels
 Kernel code to execute on GPU
 Scalar routine executed on one thread
 Launched in parallel on a grid of thread blocks

2. The Host Program
Declaring and allocating device memory data
 Moving data to and from the device
 Launching kernels


3. Writing Kernels
What is allowed in a kernel vs. what is not allowed
  Grids, Blocks, Threads, Warps

4. Building and Running CUDA Programs
Compiler options
  Running your program
  The CUDA Runtime API
  CUDA Fortran vs. CUDA C


5. Performance Tuning, Tips and Tricks
Measuring performance, using cudaprof
  Optimizing your kernels
  Optimize communication between host and GPU
  Optimize device memory accesses, shared memory usage
  Optimize the kernel code
    loop unrolling
    thread block unrolling
    grid unrolling


Part III. PGI Accelerator Model

1. High-level GPU Programming using the PGI Accelerator Model
What role does a high-level model play?
  Basic concepts and directive syntax
  Accelerator compute and data regions
  Appropriate algorithms for a GPU

2. Building and Running Accelerator Programs
Command line options
  Enabling compiler feedback


3. Accelerator Directive Details
Compute regions
  Clauses on the compute region directive
  What can appear in a compute region
  Obstacles to successful acceleration
  Loop directive
  Clauses on the loop directive
  Loop schedules
  Data regions
  Clauses on the data region directive


4. Interpreting compiler feedback
Using pgprof source browser
  Hindrances to parallelism
  Data movement feedback
  Reading kernel schedules


5. Performance Tuning, Tips and Tricks
Appropriate algorithm
  Optimizing data movement between host and GPU
  Data regions, mirrored / reflected data, CUDA data
  Optimizing kernel performance
  Tuning the kernel schedule
    unroll clauses
  Choosing accelerator device
  PGI Unified Binary
  Performance profiling information
  GPU initialization time on Linux


Part IV. Wrapup, Questions

1. Accelerators in HPC
Past, present, future role of accelerators in HPC
  Past, present, future of programming models for accelerators

Day 3, Molecular Dynamics Codes on GUGPUs - CSCS: Sadaf Alam, Jeff Poznanovic, and Tim Robinson 

Pre-requisites: familiarity with running parallel MD on clusters

- Introduction of GPGPU technologies for scientific computing
- Overview of parallel classical molecular dynamics software
- Evolution of GPU acceleration for classical molecular dynamics software
- Walkthrough using GPU accelerated NAMD / pmemd (Rosa vs. Eiger)
- Demo with Case studies
- Tips and tricks for optimal usage of GPU accelerated simulations
- Advanced topics and future outlook
- LAB session


Back to listing