CÉCI HPC Training 2017

The CÉCI offers, each year, a full-fledge offer in training sessions for researchers using, or willing to use, the CÉCI clusters. The training sessions are hosted in Louvain-la-Neuve and are organised in collaboration with the CISM, SMCS, ELIC, INMA, CP3, and NAPS. The whole offer goes from baby steps into the Linux world to extreme MPI programming and GPU computing. The sessions are readily cluster-oriented, so knowledge of a programming language or of a scientific computing software is assumed. Training session are delivered in English.

Click on a topic in the image below to get detailed information. Practical information available at the bottom of this page. The next sessions will be organised in October 2017. The slides of the 2016 edition are available below.

2017 Tentative Schedule

 

Introduction to high-performance computing

This first session introduces to the field of high performance computing and presents the whole training offer.

Contents:

  • Introduction to cluster computing: strengths and weaknesses
  • Presentation of the CÉCI clusters and collborators, and Tier-1
  • Presentation of the training sessions
  • Presentation of the account creation process

Slides: v2015 (PDF)

No prerequisite.
Prerequisite for: all the other sessions.

Type: Lecture
Target audience: Everyone
Must: This session is mandatory.

 

Introduction to Linux and the command line

Linux is the most common operating system in the HPC world. Basic understanding of Linux and GNU commands are necessary to take advantage of the clusters. The Linux session is organized by the CISM and aimed at people new to Linux.

Contents:

  • Introduction to Linux
  • The file system and the permissions
  • The command line and the GNU utilities
  • Using a command line file editor
  • Basic shell usage (redirections, pipes, etc)
  • Basic Bash scripting (conditionals, loops, etc)
  • Transferring and compressing files (rsync, tar)
  • Task control (setting a taks in the background, using cron)
  • Getting help (man, info, help)

Material: v2015 (ZIP)

No technical prerequisite.
Prerequisite for: all other sessions.

Type: Hands-on,
Target audience: Everyone not familiar with a terminal.
Must: This session is mandatory for the target audience.

 

Connecting with SSH: Introduction and advanced topics

SSH is the protocol used to connect to the clusters. This session presents the complete use of the tools to make access to the clusters easy (without being harrassed by passphrase requests, coping with firewalls in a transparent manner, transferring files from one cluster to another, etc.)

Contents:

  • SSH client usage and common errors
  • SSH keys, passphrases and agents
  • SSH configuration file
  • Passphrase managers
  • Tunnels, proxies and (pseudo-)VPNs
  • SSH-based file transfer (SCP, rsync, Unison, SSHFS)

Slides: v2016 (PDF) Xming+Putty package installer Xming Zip file

Prerequisite:

  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
Prerequisite for: all the other sessions

Type: Hands-on,
Target audience: Everyone
Must: This session is mandatory.

 

Introduction to structured programming with Fortran

Fortran is the historical language of scientific computing. Many often-used mathematical libraries used nowadays were writen initially in Fortran and have evolved with both the language and the hardware.

Contents:

  • Historical perspective
  • Syntax
  • Printing and formatting
  • Memory management
  • Fortran I/O

Slides: v2016 (PDF)Samples (ZIP).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Hands-on
Target audience: Rookie programmer
Must: This session is a nice-to-have for those who need to develop fast scientific software.

 

Introduction to Object-Oriented programming with C++

C++ is often regarded as a fast, multi-purpose, programming language, based on the Object-Oriented paradigm that allows mapping natural, or in a scientific context, mathematical objects to a representation in a computer for processing.

Contents:

  • OO programming
  • Compilation process
  • Overloading and inheritance
  • Templates

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Hands-on
Target audience: Rookie programmer
Must: This session is a nice-to-have for those who need to develop fast scientific software.

 

Introduction to Python

Python is a programming language that can be used both for scientific computing (as a replacement for Fortran or C/C++) and for everyday scripting (as a replacement for Bash or Perl), but also to develop full programs with a GUI as well as headless services. It is a very nice tool to include in any scientist's toolbox.

Contents:

  • Language syntax
  • The core modules
  • Installing external modules
  • Writing and distributing modules

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Notions of programming

Type: Hands-on
Target audience: Rookie programmer
Must: This session is a nice-to-have for those who do not know Python.

 

Introduction to scientific software development and deployment

Often, the workflow for researchers is to acquire a piece of software, and either modify it, or wrap it in scripts, or simply install it on the clusters, or all of that at the same time, on many clusters. This session will introduce to the tools that can make this whole process easier.

Contents:

  • Programming paradigms
  • Types of languages and the choice of the language.
  • Tools for deploying software
  • The programmer's toolkit
  • Writing comments and elements of style

Slides: v2016 (PDF)

Prerequisite:

  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Lecture
Target audience: Rookie programmers
Must: This session is a nice-to-have.

 

Introduction to compilers and compiling, and optimized libraries

The choice of the compiler (many clusters have several, a.o. gcc, intel, etc.) is important, as is the choice of the compiling options. This sessions reviews the strengths and weaknesses of the compilers and their optimal use.

Contents:

  • Using modules
  • The GCC Compiler and its options
  • Intel Compiler Studio
  • The different levels of code optimization
  • Using external libraries
  • Using configure and make
  • Downloading and compiling source code
  • Using linear algebra, signal processing libraries, etc.(BLAS, MKL, ACML, etc.)

Slides: v2016 (PDF)

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of C or Fortran

Type: Hands-on
Target audience: Rookie programmers
Must: This session is a must for anyone who needs to compile software from sources.

 

Introduction to scripting and interpreted languages (Python, R, Octave)

Interpreted languages are often thought as less performance-oriented than compiled languages. Yet, they are often much easier to use and the time spent developing code is much lower than with compiled languages, and many tools are available to make them nearly as performant as compiled languages.

Contents:

  • Use cases, strengths and weaknesses
  • The trilogy: Octave, R, Python
  • Compiling and linking the interpreter
  • Installing additional libraries
  • Using compiled code inside interpreted code
  • Compiling interpreted code

Slides: v2016 (PDF)

Prerequisite:

  • Have a basic understanding of procedural programming (conditionals, loops, etc.)
  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Hands-on
Target audience: Rookie programmers
Must: This session is a nice-to-have.

 

Introduction to parallel computing

Before diving into the concrete programming examples with MPI and OpenMP, this session introduces some theoretical concepts and presents the several paradigms and tools offered by Linux for parallel computing when a program itself is not able to run in parallel.

Contents:

  • Theoretical concepts: parallelism, speedup, scaling, overhead, etc.
  • Common parallel computing paradigms (SPMD, Map/Reduce, etc.)
  • GNU tools for parallel computing (xargs and GNU parallel)
  • Parallel computing with pipelines (UNIX pipes and FIFO files)

Slides: v2016 (PDF)

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Hands-on
Target audience: Everyone
Must: This session is a nice-to-have.

 

Parallel programming with MPI

MPI is a standard for passing messages between processes running on distinct computers. It offers high-level primitives for efficient communication.

Contents:

  • Parallel programming paradigms
  • Shared-memory vs message passing
  • Compiling an MPI program
  • Collective communication
  • Reduction operations
  • Communication modes

Slides: v2016 (PDF)

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of C or Fortran

Type: Hands-on
Target audience: Rookie programmer
Must: This session is a must-have for anyone writing or even simply using MPI programs

 

Parallel programming with OpenMP

OpenMP is an easy alternative to pthreads for multithread computing. OpenMP extensions now exist in most C and Fortran compilers and allow flagging loops and other construct for efficient multithreading with little supplementary programming effort.

Contents:

  • Parallel programming paradigms
  • OpenMP Execution model
  • Compiling an OpenMP program
  • Data races
  • Parallel loops
  • Barriers and synchronisation

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of C or Fortran


Type: Hands-on
Target audience: Rookie programmer
Must: This session is a must-have for anyone writing programs in C/C++ or Fortran

 

Preparing, submitting and managing jobs with Slurm

Slurm is the job manager installed on all CÉCI clusters. The session teaches attendees how to prepare a submission script, how to submit, monitor, and manage jobs on the clusters.

Contents:

  • Role and duties of a job scheduler/resource manager
  • Creating and submitting a job
  • Setting job constraints and parameters
  • Managing and monitoring jobs
  • Working interactively
  • Getting accounting information for the jobs
  • How priorities are computed
  • Creating parallel jobs with shared-memory software
  • Creating parallel jobs with message passing software
  • Creating parallel jobs with master/slave software

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Passive knowledge of parallelisation techniques (OpenMP, MPI)
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Hands-on
Target audience: Everyone
Must: This session is mandatory.

 

Using a Checkpoint/restart program to overcome time limits

Checkpointing and Restarting, or the art of stopping some computations to continue them later, or on another computer, is a very convenient way to get past time limits set on the clusters, and to protect against hardware or software failure on the compute nodes.

Contents:

  • Use and challenges of checkpointing
  • The different approaches
  • Checkpointing in Slurm
  • Using DMTCP for checkpointing

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of either C, Fortran, Octave, Python or R

Type: Hands-on
Target audience: Everyone
Must: This session is a must-have for anyone feeling oppressed by time limits.

 

Checkpoint/restart programming technique

Checkpointing and Restarting, or the art of stopping some computations to continue them later, or on another computer, is a very convenient way to get past time limits set on the clusters, and to protect against hardware or software failure on the compute nodes.

Contents:

  • Checkpointing recipes
  • Using signals to trigger a checkpoint
  • A note about parallel programs
  • Checkpointing in Slurm

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of either C, Fortran, Octave, Python or R

Type: Hands-on
Target audience: Everyone
Must: This session is a must-have for anyone feeling oppressed by time limits.

 

Using a workflow manager to handle large amounts of jobs

Many scientific workflow involve submitting and managing a large number of jobs. This session explores the software that make that process easier.

Contents:

  • Notions of scientific workflows
  • Slurm submission scripting
  • Presentation of BOSCO, Fireworks, and Clusterlib

Slides: Partie 1v2016 (PDF), Partie 2 v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of either C, Fortran, Octave, Python or R

Type: Hands-on
Target audience: Everyone
Must: This session is a must-have for anyone who happens to write scripts that create submission scripts and automatically submits them.

 

Debugging and profiling scientific code, and commercial optimized libraries

When a piece of software does not work the way it is expected to, it needs debugging. Then, when it works, it needs profiling to remove the bottlenecks.

Contents:

  • Debugging principles
  • The GNU debugger (gdb)
  • The Intel debugger
  • Advanced features of Intel Cluster studio
    • the support of MIC architecture (Xeon Phi)
    • the Guided Auto Parallelism
    • the Coarray Fortran support
  • Intel's MKL

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of either C, Fortran, Octave, Python or R
  • Working knowledge of C or Fortran
  • Familiarity with OpenMP and MPI

Type: Hands-on
Target audience: Programmers
Must: This session is important for programmers who want to optimize their code for usage on a cluster.

 

Introduction to hardware accelerators (GPUs and Phis)

Accelerators, or co-processors, are more and more common in HPC, especially GPU's and Xeon Phi's. They offer intense performances (sometimes a whole cluster in a single compute node) but they also come with challenges that must be addressed to benefit from their compute power. This sessions gives the tools to decide whether or not one should invest time and effort into porting their code on accelerators, and how to configure/compile software to use them.

Contents:

  • The several types of accelerators
  • GPU computing
  • GPU Libraries in many languages
  • Xeon Phi computing
  • The several usage modes

Slides from v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of either C, Fortran, Octave, Python or R
  • Working knowledge of C or Fortran
  • Familiarity with OpenMP and MPI

Type: Hands-on
Target audience: Everyone
Must: This session is a nice-to-have.

 

Programming hardware accelerators (GPUs and Phis)

Accelerators, or co-processors, are more and more common in HPC, especially GPU's and Xeon Phi's. They offer intense performances (sometimes a whole cluster in a single compute node) but they also come with challenges that must be addressed to benefit from their compute power. This sessions gives the tools to decide whether or not one should invest time and effort into porting their code on accelerators, and how to configure/compile software to use them.

Contents:

  • Writing CUDA code for GPUs
  • Using the CUDA profiler
  • Native and Offload Openmp on Phis
  • Native and Hybrid MPI on Phis

Slides from v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Working knowledge of C
  • Familiarity with OpenMP and MPI

Type: Hands-on
Target audience: Confirmed programmer
Must: This session is a nice-to-have.

 

Introduction to code versioning

Code versioning is very important to master, even for non programmers. It allows tracking the changes made to a submission script, a piece of code, a configuration file, or event a dataset and propagate the changes in a consistent and systematic way to all clusters.

Contents:

  • Notions of code versioning
  • Working as a team with code versioning
  • Using git and svn to access code from others
  • Publishing code
  • Using mercurial to manage one's own code/file

Slides from v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)

Type: Hands-on
Target audience: Rookie programmer
Must: This session is a must-have for anyone not familiar with Git or Mercurial.

 

Introduction to data storage and access

Storing data in an efficient way is very important for many scientific applications. Yet, most of the time, a myriad of small files is used, imposing a large burdun on the file system, spending a lot of time in file access, making transfers very inefficients, etc. Other solutions exist and are presented in this session.

Contents:

  • Storing in files vs in database
  • Using an in-memory database
  • Using HDF5 CLI tools and libraries

Slides from v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Passive knowledge of either C, Fortran, Octave, Python or R
  • Working knowledge of C or Fortran
  • Familiarity with OpenMP and MPI

Type: Hands-on
Target audience: Everyone
Must: This session is a must-have for anyone who thinks generating a million small files is an optimal way of storing data.

 

Efficient use of Matlab on the cluster

Matlab, and its free, mostly-compatible, alternative Octave are becoming more and more suitable for cluster computing with the rise of parallel computing toolboxes and/or syntactical constructs. The session explores the many was Matlab code can be parallelised and submitted to a cluster (taking into account the tough problem of licensing).

Contents:

  • Using Matlab in batch mode
  • The matlab compiler
  • The Octave alternative
  • Implicit multithreading
  • Explicit multithreading/processing
  • Matlab and the MPI

Slides: v2016 (PDF).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Working knowledge of Matlab/Octave

Type: Hands-on
Target audience: Confirmed Matlab user
Must: This session is a must-have for anyone who wonders why Matlab is not installed on the CÉCI clusters.

 

Efficient use of Python on the cluster

The use of Python for scientific computing is rising thanks to modules such as numpy, scipy and mathplotlib. This session explores the efficient uses of Python in that context for situations where numpy and co. are of less use. It assumes a working knowledge of Python and is delivered by Bertrand Chenal (CÉCI).

Contents:

  • Installing libraries
  • Numpy
  • Scipy
  • Matplotlib
  • MPI python
  • Multithreading
  • Compiling

Slides: v2016 (html).

Prerequisite:

  • Being able to use SSH with private keys
  • Being familiar with a text editor
  • Mastering the Linux command line and the GNU utilities (mkdir, cp, scp, etc.)
  • Working knowledge of Python

Type: Hands-on
Target audience: Confirmed Python user
Must: This session is a must-have for anyone thinks Python is slow.

Venue

The sessions will be organised in Louvain-la-Neuve either the DAO room or the Cérès room.

DAO room

Vinci building, room A-182, Bâtiment Vinci, Place du Levant 1. The Vinci building is labelled '2' on the map of Louvain-la-Nevue, in square E8. Enter the building through its east-side gate entrance (located on Place Sainte Barbe, not through the main doors) and then turn right in the inner courtyard. Enter the building through the door in front of you and go towards the right corridor.

Cérès room

Mendel building, room C-169, Croix du Sud, 2. The Mendel building is labled '17' map of Louvain-la-Nevue in square F8. Enter through the east entrance (not through the main doors) near the parking places. Enter the corridor and turn right.

Parking

The closest parking space available for visitors is the parking Baudouin 1er.

Access to the computers

Participants are encouraged to request a CÉCI account prior to attending the training sessions. A set of generic accounts will be available for participants who are not eligible for a CÉCI account.

Certificate of participation

All participants will receive a certificate of participation provided they sign an attendance sheet at then end of each session they attend. The certificates of participation will be sent in early December.

Wifi

Should you want to bring your laptop rather than using the computers in the training room, you are welcome to do so provided you are able to connect to the Eduroam wifi on your own.

Registration

Registration is free but mendatory. The room capacity is 25 workstation + 25 additional seats. Participants will be welcomed on a first come first serve basis. If you register but are unable to attend, you are required to notify us as soon as possible so that we can welcome other participants who are on the waiting list.

Beware that sessions that are marked as mandatory must be attended ; all the material covered in those sessions is a requirement for all the subsequent sessions. For instance, after the session about SSH, we will assume all participants are able to connect to the cluster without assistance. Notions such as SSH agent and SSH proxy must be mastered by the participents. Please pay attention to the requirements of the sessions you register to.

You can direct all your questions to egs-cism@listes.uclouvain.be. Be sure to include '[Training]' in the subject of your email for faster processing.

© CÉCI.