Efficient and distributed training with TensorFlow on Piz Daint
The Swiss National Supercomputing Centre is pleased to announce that the "Efficient and distributed training with TensorFlow on Piz Daint" workshop will be held from March 14-15, 2019 at CSCS in Lugano, Switzerland.
The Piz Daint supercomputer at CSCS provides an ideal platform for supporting intensive deep learning workloads as it comprises thousands of Tesla GPU compute nodes communicating through a high-speed interconnect. In this two-day course, we will look at how to run distributed deep learning workloads with TensorFlow on Piz Daint. We will use simple examples to demonstrate best practices for building efficient input pipelines to maximize the throughput of deep learning models with TensorFlow. TensorFlow is one of the most popular numerical libraries for deep learning and contains an extensive collection of algorithms optimized to exploit hardware as efficiently as possible.
The course will include the following topics:
- Running TensorFlow on Piz Daint.
- Creating efficient input pipelines with TensorFlow's Dataset API for optimizing the throughput on Piz Daint.
- Reading and writing data as TFRecords files.
- Understanding the stochastic gradient descent and distributed synchronous stochastic gradient descent algorithms.
- Performing distributed training with TensorFlow and the ring allreduce algorithm implemented in Horovod (Keras and Tensorflow's Estimator API).
- Understanding Horovod and TensorFlow's operations timeline.
The material presented during the course (slides & notebooks) can be found on Github: https://github.com/eth-cscs/tensorflow-training
This course is addressed to scientists who are planning or are already engaged in intensive machine learning workloads and wish to start using TensorFlow on Piz Daint.
10:00-10:15 Introduction to the course
10:15-11:00 Running TensorFlow on Piz Daint
11:00-12:00 Lecture/Hands on: Getting started with input pipelines
13:00-15:00 Lecture/Hands on: Input pipelines (continuation)
15:00-15:20 Coffee Break
15:20-16:00 TensorFlow's timeline/Hands on
09:00-10:10 Lecture/Hands on: Stochasting Gradient Descent and distributed Stochasting Gradient Descent
10:10-10:30 Coffee Break
10:30-11:30 Lecture/Hands on: Distributed Stochasting Gradient Descent (continuation)
11:30-12:00 Machine room visit
13:00-15:00 Hands on: Horovod
15:00-15:20 Coffee Break
15:20-16:00 Lecture: Cray-ML plugin and TensorFlow's CollectiveAllReduceStrategy
All participants must register for the course. The registration fee includes coffee breaks and lunches throughout the two day course.
Course Fee: 160 CHF
Deadline for registration: Tuesday, March 5, 2019
Please contact Rafael Sarmiento (firstname.lastname@example.org) and Guilherme Peretti-Pezzi (email@example.com) for questions related to the course content and firstname.lastname@example.org for questions related to the event logistics.
Kindly note that no parking space is available at the Swiss National Supercomputing Centre. The closest bus stop to the centre is Lugano, Stadio. From Lugano railway station, you should take bus number 4 or 6.
Suggestions regarding travels and accommodation are available here.
You are encouraged to travel by public transportation or to use the Park & Ride Resega parking lot, within five minutes walk from CSCS.
We look forward to welcoming you at CSCS!