AI and Predictive Analytics in Data-Center Environments

Introduction

Artificial Intelligence, Data Science and Machine Learning are nowadays present in many research fields and engineering works. Applying such techniques often involve big volumes of data, heavy processes requiring lots of hours of computing, large amounts of repetitive executions or exhaustive experiments to be run. In those situations, running experiments or applications in our laptop or even in our workstation is not enough, and we need bigger machines found in data-centers.

As not everyone is familiar with the capabilities of High Performance Computing (HPC) environments and the capabilities they offer, like distribution of data processes, in this course we will go through basic concepts like performance, parallelism or virtualization. On the other hand, for those who need to run but are not familiar with machine learning and data analytics processes, we will overview those concepts, including supervised and unsupervised learning, also neural networks, understanding that the different machine learning experiments can leverage parallelism and HPC. Aside of some theory introducing the important concepts, this course is fundamentally based on practice and experimentation, introducing some cases of use and exercises on Apache Spark, a platform for distributing data processes, and Intel BigDL, a Spark library optimized for neural network and Deep Learning.

The contents of the course are materialized as a set of Video tutorials, including the corresponding presented slides, and exercises that the student can follow and practice at home, in order to understand and experiment through examples of the contents here presented. Additionally, we are providing scripts and guides to set-up the different technologies here shown, for the students to deploy those environments at home and at work.

Video Tutorials

+5 hours
Chapters

3 Topics
Assignments

8 Exercises
Examples

11 Notebooks & 3 Demo Scripts

This is the version v.1 of the course. We will continue improving and adding material to this course with received feedback, also methods and technology updates.

About this course

Students, Researchers and Professionals

This tutorial is an introductory course for those undergraduate CS students that want to practice with some AI and data-center examples, to those professionals from different disciplines requiring AI algorithms and data-center resources for their daily jobs, and researchers from non-CS fields that can leverage HPC systems and AI frameworks to enhance their research and experiments with data. This is not an advanced course, so if you already know about machine learning or computer architecture and systems, probably you will only be interested in some parts of this tutorial, but anyway we invite you to check it out in case you can find something new to learn. We have prepared this course thinking in those researchers working around us at the Barcelona Supercomputing Center and in partner research groups, that being from fields like Mathematics, Biology, Genomics or Earth sciences, have the need to use big computers to process large amounts of experiments, but never had the occasion to use frameworks that definitely can improve their daily work at the lab or office.

Sponsors

This course has been financed in part by the Intel's Academic Educational Mindshare Initiative for AI, allowing us to present the following tutorials and hands-on examples. All the technologies here presented have some kind of Free or OpenSource license, avoiding restrictive software. Also although we will show preference to technologies provided by Intel, the students will find plenty of alternative techologies from other companies and foundations, in case they please or have some other preference or constrain. So don't worry if you have to use other libraries different from ours, as all the concepts here presented are "universal" in computer sciences and we use such technologies as a way to materialize them.

Who are we

We are the Data Centric Computing research group, at the Barcelona Supercomputing Center. This effort has been carried on by PhD. Josep L. Berral and Eng. Francisco Javier Jurado. Our main objective is to make easier to professionals of different disciplines and sciences to understand these new technologies of AI, computers and data sciences. We want to thank also D.Carrera (BSC, UPC), A.Gutierrez-Torre (BSC), D.Buchaca (BSC), F.Portella (BSC, Petrobras) and N.Poggi (Databricks) for their support and help on this project. Also special thanks to the people from Communications at BSC.

AI & Predictive Analytics in Data-Center Environments

Introduction

About this course

Technologies

Python

Apache Spark

Intel BigDL

VirtualBox

Video Tutorials

Data-Centers

Machine Learning

Hands-On Spark

Support Scripts