IEEE Hot Interconnects

Tutorial 1

Title

Accelerating Big Data Processing with Hadoop, Spark, and Memcached Over High-Performance Interconnects

Speakers

Dhabaleswar K. (DK) Panda and Xiaoyi Lu (Ohio State University)

Abstract

Apache Hadoop and Spark are gaining prominence in handling Big Data and analytics. Similarly, Memcached in Web 2.0 environment is becoming important for large-scale query processing. These middleware are traditionally written with sockets and do not deliver best performance on datacenters with modern high performance networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase, etc.), Spark and Memcached. We will examine the challenges in re-designing the networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage architecture. Using the publicly available software packages in the High-Performance Big Data (HiBD, http://hibd.cse.ohio-state.edu) project, we will provide case studies of the new designs for several Hadoop/Spark/Memcached components and their associated benefits. Through these case studies, we will also examine the interplay between high performance interconnects, storage systems (HDD and SSD), and multi-core platforms to achieve the best solutions for these components.

Bio

Dhabaleswar K. (DK) Panda

Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, communication protocols, files systems, network-based computing, and Quality of Service. He has published over 350 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and 10GigE/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,400 organizations worldwide (in 75 countries). This software has enabled several InfiniBand clusters (including the 7th one) to get into the latest TOP500 ranking. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and Linux distributors. The new RDMA-enabled Apache Hadoop and Memcached packages, consisting of acceleration for HDFS, MapReduce, RPC and Memcached and support for clusters with Lustre file systems, are publicly available from http://hibd.cse.ohio-state.edu. Dr. Panda's research is supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda, including a comprehensive CV and publications are available here.

Xiaoyi Lu

Dr. Xiaoyi Lu is a Research Scientist in the Department of Computer Science and Engineering at the Ohio State University, USA. He obtained his Ph.D. degree in Computer Science from Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His current research interests include high-performance interconnects and protocols, Big Data, Hadoop/Spark Ecosystem, Parallel Computing Models (MPI/PGAS), GPU/MIC, Virtualization and Cloud Computing. He has published over 60 papers in major journals and international conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Recently, Dr. Lu is doing research and working on design and development for the High-Performance Big Data project (http://hibd.cse.ohio-state.edu). He is a member of IEEE. More details about Dr. Lu are available here.

Tutorial 2

Title

Designing and Developing Performance Portable Network Codes

Speakers

Pavel Shamis (ARM), Alina Sklarevich (Mellanox Technologies), Swen Boehm (Oak Ridge National Laboratory)

Abstract

Developing high-performing and portable communication libraries that caters to different programming paradigms and different architectures is challenging and a resource consuming task. For performance, these libraries have to leverage the rich network capabilities provided by the network interconnects, often requiring deep understanding of wide-range of network capabilities, interplay of software and hardware components, and its impact on the library implementation and application communication characteristics. In this tutorial, we will provide insights into the challenges and complexity in developing such communication libraries. We will share our experience developing Unified Communication X (UCX), a framework of network APIs and implementations for compute- and data-intensive scientific computing.

UCX is a collaboration of national laboratories, academic, and industry to develop the next generation communication framework for current and emerging programming models. The framework provides a collection of APIs that enables customizing network functionality for application requirements and a given system architecture. Further, it provides a simple way to map application requirements to network capabilities, with little or no user involvement. UCX is an open source project whose participants include many leading HPC researchers from IBM, NVIDIA, Mellanox Technologies, ARM, LANL, ANL, ORNL, University of Knoxville Tennessee, and University of Houston.

Bio

Pavel Shamis

Pavel Shamis is a Principal Research Engineer at ARM. His research interests include high-performance communication networks, communication middleware, and programming models. Prior to joining ARM, he spent five years at Oak Ridge National Laboratory (ORNL) as a research scientist at Computer Science and Math Division (CSMD). In this role, Pavel was responsible for research and development multiple projects in high-performance communication domain including: Collective Communication Offload (CORE-Direct & Cheetah), OpenSHMEM, and OpenUCX. Before joining ORNL, Pavel spent ten years at Mellanox Technologies, where he led Mellanox HPC team and was responsible for development HPC software stack, including OFA software stack, OpenMPI, MVAPICH, OpenSHMEM, and other. Pavel earned his MCS of Computer Science degree from Colorado State University, and his B.Sc degree in Education in Technology and Computer Science from Technion, Israel Institute of Technology. Pavel is a recipient of R&D100 award for development of the CORE-Direct collective offload technology.

Alina Sklarevich

Alina Sklarevich is a software developer in the HPC group at Mellanox Technologies. During the past five years she has worked on development of MXM, the Mellanox Messaging Accelerator, contributed to OpenMPI, and is currently working on development of the OpenUCX project. Alina earned her B.Sc in Communication System Engineering from the Ben-Gurion University in Israel and is now an M.Eng student in System Engineering at the Technion, Israel Institute of Technology.

Swen Boehm

Mr. Swen Boehm is a research assistant in Oak Ridge National Laboratory's Computer Science and Mathematics Division. His research interests include Programming Models for high-performance computing (HPC), networking libraries, as well as system software and tools for the HPC ecosystem. Mr. Boehm earned his MSc from the University of Reading, England.

Tutorial 3

Title

Data-Center Interconnection (DCI) Technology Innovations in Transport Network Architectures

Speakers

Loukas Paraschis and Abhinava Shivakumar Sadasivarao (Infinera)

Abstract

This tutorial reviews the most important technology innovations in Data-Center Interconnection (DCI) and their increasingly important role in the transport network architectures. It focuses primarily on the interplay among new transport technologies, across layer1-3, and the related "SDN" motivated network programmability and control plane evolution, that are being adopted to address the related transport network challenges.

The increasing availability of fast and reliable network connectivity has enabled the transition to an Internet-based service delivery model, commonly referred to as "cloud". The underlying infrastructure consists of data-centers of massive computing, and storage resources. Networking is crucial in interconnecting, and optimizing the cost-performance, of the "cloud infrastructure". As a result the interconnection of datacenters, or DCI, is one of the largest contributors to the increased traffic demands of the next-generation Internet transport networks. The increasing importance of DCI has also been motivating the evolution of traditional optical transport, routing, traffic engineering, and security. We analyze the characteristics and implications of the most significant advancements in transport technologies, systems, network control-plane, and standards; notably including source packet routing, flex-spectrum super-channel coherent DWDM transmission, and software programmability, automation and abstraction. We also attempt to evaluate the interplay among the intra and inter data-center networking architectures, system design, and the enabling interconnection technology innovations. Finally, future network research topics, and related emerging standards, will also be discussed.

Bio

Loukas (Lucas) Paraschis

Loukas Paraschis is senior director for data-center transport for Internet cloud and content providers at Infinera. From 2006-2015, Loukas was cisco's senior technology architect for WAN transport, and from 2000-2006 cisco's technical leader in optical networking and routing. He completed graduate studies at Stanford University (PhD applied physics 1999, MS EE 1998), has (co)authored more than 100 peer-reviewed publications, invited, and tutorial presentations, a book, two book chapter, and two patents, and has been associate editor for the Journal of Optical Communication and Networks, guest editor of the Journal of Lightwave Technology, chair of multiple conference organizing committees, Fellow of the OSA, senior member of the IEEE, and was an IEEE Photonics Society Distinguished Lecturer (2009). Loukas was born in Athens, Greece, where he completed his undergraduate studies.

Abhinava (Abhinav) Shivakumar Sadasivarao

Abhinava (Abhinav) Shivakumar Sadasivarao is a senior systems architecture engineer at Infinera. He has been with Infinera since 2012 and works in the Systems Architecture group focusing on system requirements' specification and software architecture for Infinera's DCI/Cloud platforms. In the past, he was involved in multiple first-of-a-kind PoCs of SDN applicability to optical transport, resulting in numerous vendor interop demonstrations. He completed his graduate studies at Carnegie Mellon University (MS '12) and has (co)authored peer-reviewed (and invited) publications at IEEE, ACM and OSA conferences. Abhinav hails from the beautiful garden city Bengaluru (India) where he completed his undergraduate studies.

Tutorial 4

Title

Efficient Communication in GPU Clusters with GPUDirect Technologies

Speakers

Davide Rossetti and Sreeram Potluri (NVIDIA)

Abstract

Discrete GPUs have become ubiquitous in computing platforms. State-of-the-art GPUs are connected on a compute node via the PCI-Express bus and have dedicated on-board high-bandwidth memory. Efficiently feeding data to the GPU and streaming results out of the GPU is critical to maximize the utilization of compute resources on the GPU. GPUDirect is a family of technologies that allow peer GPUs, CPU, third party network adapters, solid-state drives and other devices to directly read and write to a GPU device memory. They eliminate unnecessary memory copies and dramatically reduce CPU overhead involved in moving data from/to GPU device memory. This can result in significant improvements in communication performance for applications. GPUDirect Async, the most recent addition to this technology suite, also allows a GPU to trigger and poll for completion of operations performed by a third-party I/O device. In this tutorial, we provide an overview of GPUDirect family of technologies. We then go into details of each of the technologies including GPUDirect Peer-to-peer, GDR Copy, GPUDirect RDMA and GPUDirect Async. We present the capabilities in NVIDIA GPU hardware and software that enable these technologies. We provide details of user mode and kernel mode API that allow developers to add GPUDirect capabilities in communication libraries and network drivers. We also provide an overview of how node architectures can impact the performance of GPUDirect technologies.

Bio

Davide Rossetti

Davide Rossetti is lead engineer for GPUDirect at NVIDIA. Before he spent more than 15 years at INFN (Italian National Institute for Nuclear Physics) as researcher; while there he has been member of the APE experiment, participating to the design and development of two generations of APE super-computers, and main architect of two FPGA-based cluster interconnects, APEnet and APEnet+. His research and development activities are in the fields of design and development of parallel computing architectures and high-speed networking interconnects optimized for numerical simulations, in particular for Lattice Quantum Chromodynamics (LQCD) simulations, while his interests spans different areas such as HPC, computer graphics, operating systems, I/O technologies, GPGPUs, embedded systems, digital design and real-time systems. He has a Magna Cum Laude Laurea in Theoretical Physics (roughly equivalent to a Master degree) and published more than 50 papers.

Sreeram Potluri

Sreeram Potluri is a Senior Software Engineer at NVIDIA. He works on designing technologies that enable high performance and scalable communication on clusters with NVIDIA GPUs. His research interests include high-performance interconnects, heterogeneous architectures, parallel programming models and high-end computing applications. He received his PhD in Computer Science and Engineering from The Ohio State University. He has published over 30 papers in major peer-reviewed journals and international conferences.