Clusters at CÉCI

The aim of the Consortium is to provide researchers with access to powerful computing equipment (clusters). Clusters are installed and managed locally at the different sites of the universities taking part in the Consortium, but they are accessible by all researchers from the member universities. A single login/passphrase is used to access all clusters through SSH.

All of them run Linux, and use Slurm as the job manager. Basic parallel computing libraries (OpenMP, MPI, etc) are installed, as well as the optimized computing subroutines (e.g. BLAS, LAPACK, etc.). Common interpreters such as R, Octave, Python, etc. are also installed. See each cluster's FAQ for more details.

ClusterHostCPU typeCPU count*RAM/nodeNetworkFilesystem**AcceleratorMax timePreferred jobs***
Lemaitre4 UCLouvain Genoa 2.4 GHz 5120 (40 x 128) 766GB HDR Ib BeeGFS 320 TB None 2 days MPI
NIC5 ULiège Rome 2.9 GHz 4672 (73 x 64) 256 GB..1 TB HDR Ib BeeGFS 520 TB None 2 days MPI
Hercules2 UNamur Naples 2 GHz
SandyBridge 2.20 GHz
1024 (30 x 32 + 2 x 64)
512 (32 x 16)
64 GB..2 TB 10 GbE NFS 20 TB None 15 days serial / SMP
Dragon2 UMons SkyLake 2.60 GHz 592 (17 x 32 + 2 x 24) 192..384 GB 10 GbE RAID0 3.3 TB 4x Volta V100 21 days serial / SMP
Lemaitre3UCL SkyLake 2.3 GHz
Haswell 2.6 GHz
1872 (78 x 24)
112 (4 x 28)
95 GB
64 GB
Omnipath BeeGFS 440 TB None 2 days
6 hours
MPI
Dragon1 UMons SandyBridge 2.60 GHz 416 (26 x 16)
32 (2x16)
128 GB GbE RAID0 1.1 TB 4x Tesla C2075,
4x Tesla Kepler K20m
41 days serial / SMP
NIC4* ULiègeSandyBridge 2.0 GHz
IvyBridge 2.0 GHz
2048 (120 x 16 +
           8 x 16)
64 GB QDR Ib FHGFS 144 TB None 3 days MPI
Vega* ULB Bulldozer 2.1 GHz 896 (14 x 64) 256 GB QDR Ib GPFS 70 TB None 14 days serial / SMP /
MPI
Hercules* UNamurSandyBridge 2.20 GHz 512 (32 x 16) 64..128 GB GbE NFS 20 TB None 63 days serial / SMP
Lemaitre2*
UCL Westmere 2.53 GHz 1380 (115 x 12) 48 GB QDR Ib Lustre 120 TB 3x Quadro Q4000 3 days MPI
Hmem* UCL MagnyCours 2.2 GHz 816 (17 x 48) 128..512 GB QDR Ib FHGFS 30 TB None 15 days SMP
* Decomissioned clusters are listed with a greyed background.

The Consortium also enables users with access to Tier-1 facilities, not operated by the universities.

ClusterHostCPU typeCPU count*RAM/nodeNetworkFilesystem**AcceleratorMax timePreferred jobs***
Lucia Cenaero Milan 2.45 GHz
Milan 2.6 GHz
38400 (300 x 128)
1600 (50 x 32)
241 GB
241 GB
HDR Ib
HDR Ib
GPFS 3.2 PB /
200 (50 x 4) Tesla A100
48 hours
48 hours
MPI
GPU
Zenobe Cenaero Haswell 2.50 GHz
IvyBridge 2.7 GHz
5760 (240 x 24)
8208 (342 x 24)
64..256 GB
64 GB
QDR Ib
FDR Ib + QDR Ib
GPFS 350 TB t.b.a. 24 hours MPI
* In this context, a CPU is to be understood as a core or a hardware thread | count = #nodes x CPU/node ** Filesystem = global scratch space (other than /home) | RAID is a filesystem local to the nodes *** SMP = all processes/threads on the same node | MPI = multi-node

Lemaitre4 | Dragon2 | Lemaitre3 | NIC5 | Hercules2

The CÉCI clusters have been designed to accommodate the large diversity of workloads and needs of the researchers from the five universities.

The graph on the left shows a polar plot (also known as spider plot) representation of the capabilities of the CECI clusters.

On one end is the sequential workload. That type of workload needs very fast CPUs, accelerators, and often a large maximum job time (several weeks, or months!), requiring limitations on the number of jobs a user can run simultaneously to allow a fair sharing of the cluster.

On the other end is the massively parallel workload. For such workloads, individual core performance is less crucial, as long as there are many available cores. A job will be allowed to use a very large number of CPUs per job, but only for a limited period of time (a few days maximum) to ensure a fair sharing of the cluster. Generally, parallel workloads necessitate of course a fast and low latency network and a large parallel filesystem.

Finally, some workloads need huge amounts of memory be it RAM memory or local disk memory. Such workloads often also need many CPUs on the same node to take advantage of the large memory available (so-called "fat nodes").

The clusters have been installed gradually since early 2011, first at UCL, with HMEM being a proof of concept. At that time, the whole account infrastructure was designed and deployed so that every researcher from any university was able to create an account and login to HMEM. Then, LEMAITRE2 was setup as the first cluster entirely funded by the F.N.R.S. for the CÉCI. DRAGON1, HERCULES, VEGA and NIC4 have followed, in that order, as shown in the timeline here-under.

Common storage

We provide a central storage solution which is visible from all the frontends and compute nodes of all CÉCI clusters. This system is deployed on a private, dedicated, fast (10Gbps) network connecting all CÉCI sites. To move to your personal share on this common storage, it is just enough to do

 cd $CECIHOME

from any of the CÉCI clusters. As that common share is mounted on all of them, each file you copy there will be accessible from any CÉCI cluster.

Please, take a careful look at the documentation to learn about the other shares for fast transfer of big files between clusters and for group projects.

Lemaitre4

Hosted at UCLouvain (CISM), this cluster consists of more than 5000 cores AMD Epyc Genoa at 3.7 GHz. . All the nodes are interconnected by a 100 Gbps Infiniband HDR interconnect. The compute nodes have access to a 320 TB fast BeeGFS /scratch space.

Suitable for:

MPI Parallel jobs (several dozens of cores) with many communications and/or a lot of parallel disk I/O, and SMP/OpenMP parallel jobs; 2 days max.

Resources

  • Home directory (100 GB quota per user)
  • Global working directory /scratch ($GLOBALSCRATCH)
  • Node local working directory $LOCALSCRATCH dynamically defined in jobs
  • default batch queue*

Access/Support:

SSH to lemaitre4.cism.ucl.ac.be (port 22) through your university gateway, with the appropriate login and id_rsa.ceci file.

SUPPORT: CISM

Server SSH key fingerprint: (What's this?)

  • ECDSA: SHA256:krYWLlE32ygG0u8uYbXUNBRTpbxDoDVyCvg3B1zLvGQ
  • ED25519: SHA256:mWlgUkE+tBNbklXLgvrt7pL/3Ohn7uidqFfBUU0fSkQ
  • RSA: SHA256:NIhjzqQgxgkG7K1x4kqoFnNSGrbc9b8AUG8+JT68jg4

NIC5

Hosted at the University of Liège (SEGI), this cluster consists of 4672 cores spread across 73 compute nodes with two 32 cores AMD Epyc Rome 7542 CPUs at 2.9 GHz. The default partition holds 70 nodes with 256GB of RAM, and a second "hmem" partition with 3 nodes with 1TB of RAM is also available. All the nodes are interconnected by a 100 Gbps Infiniband HDR interconnect (blocking factor 1,2:1). The compute nodes have access to a 520 TB fast BeeGFS /scratch space.

Suitable for:

MPI Parallel jobs (several dozens of cores) with many communications and/or a lot of parallel disk I/O, and SMP/OpenMP parallel jobs; 2 days max.

Resources

  • Home directory (100 GB quota per user)
  • Global working directory /scratch ($GLOBALSCRATCH)
  • Node local working directory $LOCALSCRATCH dynamically defined in jobs
  • default batch queue* (Max 2 days, 256GB of RAM nodes)
  • hmem queue* (Max 2 days, 1TB RAM nodes, only for jobs that cannot run on the 256GB nodes)
  • Max 320 cpus per user

Access/Support:

SSH to nic5.uliege.be (port 22) through your university gateway, with the appropriate login and id_rsa.ceci file.

FAQ: https://www.campus.uliege.be/nic5

SUPPORT: CECI support form

Server SSH key fingerprint: (What's this?)

  • ECDSA: SHA256:xKYPziAtsf0FwtIYYa3NDL1ibZGbhUCf9B5A8p0MR30
  • ED25519: SHA256:27uhpA+zocCxLayg5g1ogej/6zJnx3kLNOftg1IOXpE
  • RSA: SHA256:oHCr1TlkQb+4Sjq/9wzBmsd8v2QfP9jJJRO+L2284gU

HERCULES2

Hosted at the University of Namur, this system currently consists of 1536 cores spread across 30 AMD Epyc Naples and 32 Intel Sandy Bridge compute nodes. The group of AMD nodes are composed of 24 ones with a single 32-core AMD Epyc 7551P CPU at 2.0 GHz and 256 GB of RAM, 4 nodes with the same CPUs and 512 GB of RAM and 2 nodes with dual 32-core AMD Epyc 7501 CPU at 2.0 GHz and 2 TB of RAM. The Intel nodes have dual 8-core Xeon E5-2660 CPU at 2.2 GHz and 64 or 128 GB of RAM (8 nodes). All the nodes are interconnected by a 10 Gigabit Ethernet network and have access to three NFS file systems for a total capacity of 100 TB.

Suitable for:

Sared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive sequential jobs, specially large in memory.

Resources

  • Home directory (200 GB quota per user)
  • Working directory /workdir (400 GB per user) ($WORKDIR)
  • Local working directory /scratch ($LOCALSCRATCH) dynamically defined in jobs
  • Nodes have access to internet
  • default queue* (Max 15 days)
  • hmem queue* (at least 64GB per core, Max 15 days)
  • Max 128 cpus/user on all partitions

Access/Support:

SSH to hercules2.ptci.unamur.be (port 22) with the appropriate login and id_rsa.ceci file.

SUPPORT: ptci.support@unamur.be

Server SSH key fingerprint: (What's this?)

  • MD5:66:50:e1:67:91:d8:17:1e:b7:be:48:00:e2:2c:7a:9f
  • SHA256:SyLaaBe7CuO7Dpa6vJa0vbAUxnYSpl30xaJo5yBF//c

DRAGON2

Hosted at the University of Mons, this cluster is made of 17 computing nodes, each with two Intel Skylake 16-cores Xeon 6142 processors at 2.6 GHz, with 15 nodes having 192GB of RAM and 2 with 384GB, all of them with 3.3 TB of local scratch disk space. Two additional nodes with two Intel Skylake 12-cores Xeon 6126 processors at 2.6 GHz have each two high-end NVidia Tesla V100 GPUs (5120 CUDA Cores/16GB HBM2/7.5 TFlops double precision). The compute nodes are interconnected with a 10 Gigabit Ethernet network.

Suitable for:

Long (max. 21 days) shared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive (cpu speed and memory) sequential jobs.

Resources

  • Home directory (40GB quota per user)
  • Local working directory $LOCALSCRATCH (/scratch)
  • Global working directory $GLOBALSCRATCH (/globalscratch)
  • No internet access from nodes
  • long queue* (Max 21 days, 48 cpus/user)
  • gpu queue* (Max 5 days – 24cpus/user 1/gpu/user )
  • debug queue* (Max 30 minutes, 48 cpus/user)
  • Generic resource*: gpu

Access/Support:

SSH to dragon2.umons.ac.be (port 22) with the appropriate login and id_rsa.ceci file.

SUPPORT: CECI Support form

Server SSH key fingerprint: (What's this?)

  • MD5:0e:a7:21:df:a5:a0:27:6c:47:ba:61:57:76:d0:82:ad
  • SHA256:LEX1JwKes2Sg1P+95Ymf+uwwrVyZaEjUMts5xejtW9A

LEMAITRE3

Lemaitre3 comes to replace Lemaitre2. It is hosted at Université catholique de Louvain (CISM). It features 78 compute nodes with two 12-cores Intel SkyLake 5118 processors at 2.3 GHz and 95 GB of RAM (3970MB/core), interconnected with an OmniPath network (OPA-56Gbps), and having exclusive access to a fast 440 TB BeeGFS parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of parallel disk I/O, 2 days max.

Resources

  • Home directory (100G quota per user)
  • Working directory /scratch ($GLOBALSCRATCH)
  • Nodes have access to internet
  • Max 100 running jobs per user
  • Default queue* (max 2 days walltime per job, SkyLake processors) and debug queue (max 6 hours, Haswell processors)

Access/Support:

SSH to lemaitre3.cism.ucl.ac.be (port 22) with the appropriate login and id_rsa.ceci file.

SUPPORT: egs-cism@listes.uclouvain.be

Server SSH key fingerprints: (What's this?)

  • ECDSA: SHA256:1Z6M2WISLylvdH9gD8vHqJ9Z7bCDdJ03avlEXO9BKsc
  • ED25519: SHA256:63mf1cm89YoPvZnpVnUXn4JjNiIpafSCfuXG+Z/LzrI
  • RSA: SHA256:eWHb7N10/Wn+sdG2ED8NqudyZ2kcWTiR33BCq2PKD7Y

NIC4

New CÉCI accounts are no more created on NIC4, and existing accounts are no more automatically renewed. Existing users are strongly encouraged to backup their important data (/home and /scratch), delete unneeded ones, and migrate to NIC5.

Hosted at the University of Liège (SEGI facility), it features 128 compute nodes with two 8-cores Intel E5-2650 processors at 2.0 GHz and 64 GB of RAM (4 GB/core), interconnected with a QDR Infiniband network, and having exclusive access to a fast 144 TB FHGFS parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of parallel disk I/O, 3 days max.

Resources

  • Home directory (20 GB quota per user)
  • Working directory /scratch ($GLOBALSCRATCH)
  • Nodes have access to internet
  • Default queue* (3 days, 448 cores max per user, 64 jobs max per user, among which max 32 running, 256 CPUs max per job)

Access/Support:

SSH to login-nic4.segi.ulg.ac.be (port 22) from your CECI gateway with the appropriate login and id_rsa.ceci file.

FAQ: https://www.campus.uliege.be/nic4

SUPPORT: CECI support form

Server SSH key fingerprint: (What's this?)

  • MD5:94:6c:d6:cc:f8:ca:b2:d0:79:38:3c:e9:d3:e3:a7:6f
  • SHA256:5mQYQTjeW1XVYDFhIfMaGyFEJiTen56r2Kyz5ocj72I

VEGA

This cluster has been decommissioned in October 2020.

Hosted at the University of Brussels, it features 14 fat compute nodes with 64 cores (four 16-cores AMD Bulldozer 6272 processors at 2.1 GHz) and 256 GB of RAM, interconnected with a QDR Infiniband network, and 70 TB of high performance GPFS storage.

Suitable for:

Many-cores (SMP and MPI) and many single core jobs, 14 days max.

Resources

  • Home/Working directory /home ($GLOBALSCRATCH=$HOME, 200GB quota)
  • Nodes have access to internet
  • Def queue* (Max 14 days, 400 cpus/user, 350 running jobs/user, 1000 jobs in queue per user)

HERCULES

This cluster has been decommissioned in August 2019.

Hosted at the University of Namur, this system currently consists of 512 cores spread across 32 Intel Sandy Bridge compute nodes, each with two 8-core E5-2660 processors at 2.2 GHz and 64 or 128 GB of RAM (8 nodes). All the nodes are interconnected by a Gigabit Ethernet network and have access to three NFS file systems for a total capacity of 100 TB.

Suitable for:

Long (max. 63 days) shared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive sequential jobs.

Resources

  • Home directory (200 GB quota per user)
  • Working directory /workdir (400 GB per user) ($WORKDIR)
  • Local working directory /scratch ($TMPDIR) dynamically defined in jobs
  • No internet access from nodes
  • cpu queue* (Max 63 days, 48 cpus/user)

DRAGON1

Hosted at the University of Mons, this cluster is made of 28 computing nodes, 26 computing nodes with two Intel Sandy Bridge (2 x 8-cores E5-2670 processors at 2.6 GHz) and 2 computing nodes with Intel Sandy Bridge (2 x 8-cores E5-2650 processors at 2.00GHz), 128 GB of RAM and 1.1 TB of local scratch disk space. The compute nodes are interconnected with a Gigabit Ethernet network (10 Gigabit for the 36 TB NFS file server). Two of those compute nodes have 2 x Tesla M2075 GPU (512Gflops float64) each one and two of those compute nodes have 2 x Tesla Kepler K20m (1.1 Tflops float64) each one.

Suitable for:

Long (max. 41 days) shared-memory parallel jobs (OpenMP or Pthreads), or resource-intensive (cpu speed and memory) sequential jobs.

Resources

  • Home directory (20GB quota per user)
  • Local working directory /scratch ($LOCALSCRATCH)
  • No internet access from nodes
  • Long queue*: long (Max 41 days, 40 cpus/user, 500 jobs/user)
  • Def queue*: batch (Max 5 days, 40 cpus/user, 500 jobs/user)
  • Generic resource*: gpu (Max 15 days, gres=gpu:kepler:1 or gres=gpu:tesla:1)
  • Generic resource*: lgpu (Max 21 days gres=gpu:1)

Access/Support:

SSH to dragon1.umons.ac.be (port 22) with the appropriate login and id_rsa.ceci file.

FAQ: http://dragon1.umons.ac.be/

SUPPORT: CECI Support form

Server SSH key fingerprint: (What's this?) MD5: 2e:98:38:cf:99:68:89:2c:1f:6a:0e:19:fb:3b:02:d1 SHA256: dbPE5/40W2M7mF7B+pc4pSo00/bqYwuv4QycU5yv+IQ

LEMAITRE2

This cluster has been decommissioned in July 2018.

Hosted at Université catholique de Louvain, it comprises 112 compute nodes with two 6-cores Intel E5649 processors at 2.53 GHz and 48 GB of RAM (4 GB/core). The cluster has exclusive access to a fast 120 TB Lustre parallel filesystem. All compute nodes and management (NFS, Lustre, Frontend, etc.) are interconnected with a fast QDR Infiniband network.

Suitable for:

Massively parallel jobs (MPI, several dozens of cores) with many communications and/or a lot of parallel disk I/O, 3 days max.

Resources

  • Home directory (50GB quota per user)
  • Working directory /scratch ($GLOBALSCRATCH)
  • Nodes have access to internet
  • Default queue* (3 days, max 50 running jobs/user)
  • PostP queue* with GPUs (6 hours)
  • Generic resource*: gpu

HMEM

This cluster has been decommissioned in July 2020.

Hosted at the Université catholique de Louvain, it mainly comprises 12 fatnodes with 48 cores (four 12-cores AMD Opteron 6174 processors at 2.2 GHz). 2 nodes have 512 GB of RAM, 7 nodes have 256 GB and 3 nodes have 128 GB. All the nodes are interconnected with a fast Infiniband QDR network and have a 1.7 TB fast RAID setup for scratch disk space. All the local disks are furthermore gathered in a a global 12TB BeeGFS filesystem.

Suitable for:

Large shared-memory jobs (100+GB of RAM and 24+ cores), 15 days max.

Resources

  • Home directory (50GB quota per user)
  • Working directory /globalfs ($GLOBALSCRATCH)
  • Local working directory /scratch ($LOCALSCRATCH)
  • Nodes have access to internet
  • Low, Middle, High queues* (15 days max, 40 running jobs per user max)
  • Fast queue* (24 hours, no access to $GLOBALSCRATCH)

LUCIA

Hosted at, and operated by, Cenaero, it features a total of 38.400 cores (AMD Milan) with up to 512 GB of RAM, 200 nVIDIA Tesla A100 GPUs, interconnected with a HDR Infiniband network, and having access to a fast 2.5PB GPFS (Spectrum Scale) parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several hundreds cores) with many communications and/or a lot of parallel disk I/O, 2 days max.

Resources

  • Home directory (200 GB quota per user)
  • Working directory /gpfs/scratch
  • Project directory /gpfs/projects
  • Batch queue + GPU queue (whole node allocation)

Access/Support:

SSH to frontal.lucia.cenaero.be (port 22) with the appropriate login and id_rsa.ceci file, from a CÉCI SSH gateway.

DOC: https://doc.lucia.cenaero.be/overview/
ABOUT tier1.cenaero.be

SUPPORT: https://support.lucia.cenaero.be

Server SSH key fingerprint: (What's this?)
ED25519: SHA256:iO2HH1V1uHUGMEEj2yvSx2TfVUNhUwqdtqdIi31jxEA ECDSA: SHA256:a5Zv6m0RJsJR4CLDmva2RrUWQea+aUC3/RWyeLYJPdg

ZENOBE

Hosted at, and operated by, Cenaero, it features a total of 13.536 cores (Haswell and Ivybridge) with up to 64 GB of RAM, interconnected with a QDR/FDR mixed Infiniband network, and having access to a fast 350 TB GPFS parallel filesystem.

Suitable for:

Massively parallel jobs (MPI, several hundreds cores) with many communications and/or a lot of parallel disk I/O, 1 day max.

Resources

  • Home directory (50 GB quota per user)
  • Working directory /SCRATCH
  • Project directory /projects
  • Large queue (1 day max walltime, 96 CPUs minimum and 4320 CPUs maximum per jobs, whole node allocation)
  • Default queue (no time limit but jobs must be restartable)

Access/Support:

SSH to zenobe.hpc.cenaero.be (port 22) with the appropriate login and id_rsa.ceci file.

QUICKSTART: www.ceci-hpc.be/zenobe.html
DOC: tier1.cenaero.be/en/faq-page
ABOUT tier1.cenaero.be

SUPPORT: it@cenaero.be

Server SSH key fingerprint: (What's this?)
MD5: 47:b1:ab:3a:f7:76:48:05:44:d9:15:f7:2b:42:b7:30
SHA256: 8shVbcnKHt861M4Duwcxpgug6l8mjj+KZu/lmYyYgpY

© CÉCI.