This event has passed.

Machine Learning on HPC systems (1 day)

May 16, 2023 @ 9:30 am - 5:00 pm

You want to train a neural network for your research project, and have just gotten access to a high performance cluster with a lot of powerful hardware. Great! But, how can you make sure that you’ll use these (expensive!) resources effectively? In this course, you will learn how to get the most results out of the computational budget you were granted.

In this course you will learn:

How to set up your software environment, and why the preinstalled software modules are useful;
How the file I/O might limit your training speed, and how to overcome that;
About the technical capabilities of modern day CPUs and GPUs (reduced precision datatypes, vector/matrix instructions);
How to find bottlenecks in your code through creating a (PyTorch) profile;
How to use multiple CPUs or GPUs in a single training (parallel computing for deep learning).

Who?

Machine Learning researchers whose requirements for training their neural networks have outgrown their local computer, and are using or planning to use a high performance computing cluster (such as Snellius) to train their models.

Prerequisites

Basic knowledge in PyTorch, TensorFlow or a similar framework;
Basic knowledge on Python programming. Some experience in using Jupyter notebooks is desireable, but not essential;
Basic knowledge in using a high performance computing cluster (see our course ‘Introduction to cluster and supercomputing);
Specifically: know how to submit a job, and how to interact with the module environment.

Sign up: https://events.surf.nl/kort4/open/8e6e2802-7165-479f-ae4e-be242555961e *Please note, registration closes May 15!

Details

Date:: May 16, 2023
Time:: 9:30 am - 5:00 pm
Event Category:: Training
Website:: https://events.surf.nl/kort4/open/8e6e2802-7165-479f-ae4e-be242555961e

Venue

: SURF Amsterdam
: Science Park 140
Amsterdam, 1098 XG Netherlands