Event Detail


Sorry, the registration period for this event is over.

Course/Workshop: GPU Programming with CUDA Fortran and the PGI Accelerator Programming Model (2010)


The HP2C platform is pleased to announce a 2-day intensive course focused on GPU programming using CUDA Fortran and PGI directives based accelerator programming model (syllabus below).  Senior members of the PGI compiler development team, Michael Wolfe and Dave Norton, will be conducting the two full days tutorial and will provide hands-on training. 

Registration deadline: 19th September 2010


Michael Wolfe and Dave Norton from the PGI compiler development team

Time9:00 - 17:00 both days

CSCS visualization and GPGPU development cluster Eiger (http://www.cscs.ch/505.0.html) will be available for the hands-on training so please ensure that you have an account on this system.  For further information contact help(at)cscs.ch. Participants are expected to bring a laptop for hands-on training.

Maximum number of participants20
Target audienceThis course is specifically aimed at HP2C users.

Participants are kindly requested to make their own arrangements for accommodation. CSCS regularly works with the following hotels:

Hotel Federale, Via Paolo Regazzoni 8, 6900 Lugano
Tel: +41 (0)91 910 08 08, www.hotel-federale.ch

Art Hotel Stella, Via F. Borromini 5, 6903 Lugano
Tel: +41 (0)91 966 33 70, www.arthotelstella.ch

Hotel San Carlo, via Nassa 28, 6900 Lugano
Tel: +41 (0)91 922 71 07, www.hotelsancarlolugano.com

For further accommodation options please visit the Lugano tourism website.

Course Syllabus1. Introduction

  • CPU architecture vs. GPU architecture
  • CPU architecture basics
  • Multicore and multiprocessor basics
  • GPU architecture basics
  • How is parallel programming for GPU's different than for multicore?
  • What is a GPU thread and how does it execute?
2. CUDA : C and Fortran

  • The CUDA programming model
  • Host code to control GPU, allocate memory, launch kernels
  • Kernel code to execute on GPU
  • The host program
  • Declaring and allocating device memory data
  • Moving data to and from the device
  • Launching kernels
  • Writing kernels
  • What is allowed in a kernel vs. what is not allowed
  • Grids, blocks, threads, warps
  • Building and running CUDA programs
  • Compiler options
  • Running your program
  • The CUDA runtime API
  • CUDA Fortran vs. CUDA C
  • Performance tuning tips and tricks
  • Measuring performance using CUDAPROF
  • Occupancy, memory coalescing
  • Optimizing your kernels
  • Optimize communication between host and GPU
  • Optimize device memory accesses, shared memory usage
  • Optimize the kernel code
  • Debugging using emulation
3. PGI Accelerator Programming Model

  • High-level GPU programming using the PGI Accelerator model
  • What role does a high-level model play?
  • Basic concepts and directive syntax
  • Accelerator compute and data regions
  • Appropriate algorithms for a GPU
  • Building and running PGI Accelerator programs
  • Command line options
  • Enabling and interpreting compiler feedback
  • Using the PGPROF source browser
  • Data movement feedback
  • Reading kernel schedules
  • Accelerator directive details
  • Compute regions
  • Clauses on the compute region directive
  • What can appear in a compute region
  • Obstacles to successful acceleration
  • Loop directive
  • Clauses on the loop directive
  • Loop schedules
  • Data regions
  • Clauses on the data region directive 
  • Performance tuning tips and tricks
  • PGI Unified Binary for multiple host or multiple accelerators
  • Performance profiling information
  • Selecting an appropriate algorithm
  • Optimising data movement between host and GPU
  • Optimising kernel performance
  • Tuning the kernel schedule
  • Optimising initialization time
4. Wrap-up and Questions

  • Accelerators in HPC
  • Past, present, future role of accelerators in HPC
  • Past, present, future of programming models for accelerators
  • How to reach an exaflop


Back to listing