Recent advances in Machine and Deep Learning (ML/DL) have led to
many exciting challenges and opportunities. Modern ML/DL and Data
Science frameworks including TensorFlow, PyTorch, and Dask have
emerged that offer high- performance training and deployment for
various types of ML models and Deep Neural Networks (DNNs). This
tutorial provides an overview of recent trends in ML/DL and the
role of cutting-edge hardware architectures and interconnects in
moving the field forward. We will also present an overview of
different DNN architectures and ML/DL frameworks with special focus
on parallelization strategies for model training. We highlight new
challenges and opportunities for communication runtimes to exploit
high-performance CPU/GPU architectures to efficiently support
large-scale distributed training. We also highlight some of our
co-design efforts to utilize MPI for large-scale DNN training on
cutting-edge CPU/GPU architectures available on modern HPC clusters.
The tutorial covers training traditional ML models including— K-Means,
linear regression, nearest neighbours—using the cuML framework
accelerated using MVAPICH2-GDR. Also, the tutorial presents accelerating
GPU-based Data Science applications using MPI4Dask, which is an
MPI-based backend for Dask. Throughout the tutorial, we include hands-on
exercises to enable attendees to gain first-hand experience of running
distributed ML/DL training and Dask on a modern GPU cluster.