In this course you will learn:
- How to set up your software environment, and why the preinstalled software modules are useful;
- How the file I/O might limit your training speed, and how to overcome that;
- About the technical capabilities of modern day CPUs and GPUs (reduced precision datatypes, vector/matrix instructions);
- How to find bottlenecks in your code through creating a (PyTorch) profile;
- How to use multiple CPUs or GPUs in a single training (parallel computing for deep learning).
Who?
Machine Learning researchers whose requirements for training their neural networks have outgrown their local computer, and are using or planning to use a high performance computing cluster (such as Snellius) to train their models.
Prerequisites
- Basic knowledge in PyTorch, TensorFlow or a similar framework;
- Basic knowledge on Python programming. Some experience in using Jupyter notebooks is desireable, but not essential;
- Basic knowledge in using a high performance computing cluster (see our course ‘Introduction to cluster and supercomputing);
- Specifically: know how to submit a job, and how to interact with the module environment.