Highlights

Events

Registration

Sorry, the registration period for this event is over.

2011: GPU Programming Workshop

 

CSCS is pleased to announce a 2 and half-day intensive course focused on GPU programming. Senior members of the PGI compiler development team, Michael Wolfe and Dave Norton, will be conducting the two full days tutorial and will provide hands-on training. The final half day will be an optional hands-on session on accelerating classical molecular dynamics simulations using GPUs, presented by CSCS
staff.

Registration deadline: March 21, 2011

Instructors

Michael Wolfe and Dave Norton from the PGI compiler development team. 

VenueCSCS, Manno. Please visit our website to find out how to reach CSCS: www.cscs.ch/233.0.html
Time

9:00 - 17:00 first two days

9:00-13:00 last day (hands-on section MD code)

Prerequisites

CSCS visualization and GPGPU development cluster Eiger (http://www.cscs.ch/505.0.html) will be available for the hands-on training. Participants are expected to bring a laptop for hands-on training.

Maximum number of participants28

 

******

Course Syllabus

1. CPU Architecture vs. GPU Architecture
 
CPU Architecture BasicsMulticore and multiprocessor basics
 GPU Architecture basics
 How is the GPU connected to the host?
 Why is parallel programming for GPUs different than for multicore?
 What is a GPU thread and how does it execute?
 How can I identify my GPU?

Part II. CUDA C and Fortran

1. Low-level GPU Programming and CUDA
 
How does data get to the GPU?                                                                                          How does a program run on the GPU?
 What kinds of parallelism is appropriate for a GPU?
 The CUDA programming model
 Host code to control GPU, allocate memory, launch kernels
 Kernel code to execute on GPU
 Scalar routine executed on one thread
 Launched in parallel on a grid of thread blocks

2. The Host Program
 
Declaring and allocating device memory data
 Moving data to and from the device
 Launching kernels

 SHORT LAB

3. Writing Kernels
 
What is allowed in a kernel vs. what is not allowed
  Grids, Blocks, Threads, Warps

4. Building and Running CUDA Programs
 
Compiler options
  Running your program
  The CUDA Runtime API
  CUDA Fortran vs. CUDA C

  LAB

5. Performance Tuning, Tips and Tricks
 
Measuring performance, using cudaprof
  Optimizing your kernels
  Optimize communication between host and GPU
  Optimize device memory accesses, shared memory usage
  Optimize the kernel code
    loop unrolling
    thread block unrolling
    grid unrolling
    pipelining
  Debugging

  PERFORMANCE LAB

Part III. PGI Accelerator Model

1. High-level GPU Programming using the PGI Accelerator Model
 
What role does a high-level model play?
  Basic concepts and directive syntax
  Accelerator compute and data regions
  Appropriate algorithms for a GPU

2. Building and Running Accelerator Programs
 
Command line options
  Enabling compiler feedback

  SHORT LAB

3. Accelerator Directive Details
 
Compute regions
  Clauses on the compute region directive
  What can appear in a compute region
  Obstacles to successful acceleration
  Loop directive
  Clauses on the loop directive
  Loop schedules
  Data regions
  Clauses on the data region directive

  LAB

4. Interpreting compiler feedback
 
Using pgprof source browser
  Hindrances to parallelism
  Data movement feedback
  Reading kernel schedules

  LAB

5. Performance Tuning, Tips and Tricks
 
Appropriate algorithm
  Optimizing data movement between host and GPU
  Data regions, mirrored / reflected data, CUDA data
  Optimizing kernel performance
  Tuning the kernel schedule
    unroll clauses
  Choosing accelerator device
  PGI Unified Binary
  Performance profiling information
  GPU initialization time on Linux

  PERFORMANCE LAB

Part IV. Wrapup, Questions

1. Accelerators in HPC
 
Past, present, future role of accelerators in HPC
  Past, present, future of programming models for accelerators

Day 3, Molecular Dynamics Codes on GUGPUs - CSCS: Sadaf Alam, Jeff Poznanovic, and Tim Robinson 

Pre-requisites: familiarity with running parallel MD on clusters

- Introduction of GPGPU technologies for scientific computing
- Overview of parallel classical molecular dynamics software
- Evolution of GPU acceleration for classical molecular dynamics software
- Walkthrough using GPU accelerated NAMD / pmemd (Rosa vs. Eiger)
- Demo with Case studies
- Tips and tricks for optimal usage of GPU accelerated simulations
- Advanced topics and future outlook
- LAB session

******



Back to listing