Status

Nic5 | Hercules2 | Dragon2 | Lemaitre3 | Lemaitre4

Common storage | Login management page

Future events

2024-09-01 Lemaitre3 Deactivation of the full system, which will become completely unavailable.

2024-07-29 Dragon1/2 Maintenance week with cleaning of global scratches.

Current issues

None. If you notice something wrong, please notify us.

Past events

2024-07-01 Lemaitre3 Cleaning of the global scratch, deactivation of slurm, and freezing of the home directories (read-only).

2024-07-01 Lemaitre4 Some short disruptions of services to be expected from time to time during the maintenance week.

2024-06-24 07:00 NIC5: Start of the urgent unplanned maintenance, NIC5 unavailable untill 13:00. Due to network problems perturbing the acces to /home or /CECI on some compute nodes, we have to drain the cluster during the w-e to have it empty of jobs Monday morning to perform a reboot of the Infiniband switches. NIC5 back at 13:00 as forecasted.

2024-06-10 Hercules2: Planned maintenance week

2024-05-13 09:30 NIC5: The second /scratch server is up, and the faulty disk has been replaced and is slowly rebuilding. To ensure data safety, until tonight, the size and number of jobs per user is strictly limited.

2024-05-12 23:30 NIC5: One of the two /scratch fileservers is down. Data are safe and available, but the performances are degraded. Submission of jobs is momentarily suspended.

2024-04-04 14:00 Hercules2: Due to a power outage, the GPU nodes on Hercules2 are unavailable. They are expected to be back in service in the next few days.

2024-04-08 16:00 Hercules2: Hercules2 is back in service.

2024-04-04 14:00 Hercules2: Due to a power outage, Hercules2 is down. The service is expected to resume Monday April 8th.

2024-04-05 09:00 UNamur CÉCI gateway: The UNamur CÉCI gateway is back online.

2024-04-04 14:00 UNamur CÉCI gateway: Due to a power outage, the UNamur CÉCI gateway is down.

2024-03-19 Lemaitre3 and Lemaitre4: Planned power cut

2024-01-29 Manneback: Planned maintenance week (New date!)

2024-02-19 Lucia: Planned maintenance (7:00-19:00)

2024-01-31 Lemaitre3: Planned power outage (7:00-19:00)

2023-10-12 07:00 NIC5: The scheduled maintenance went well and ended sooner than expected.

2023-10-12 08:36 NIC5: The CECI common file system gateway of NIC5 has been rebooted. Access to all /CECI partitions has been restored.

2023-10-12 00:49 NIC5: The CECI common file system gateway of NIC5 failed. As a consequence, access to all /CECI partitions was lost. Jobs using one of these partitions may have failed.

2023-10-02 10:00-12:00 NIC5 and CECI websites: inaccessible due to a networking issue

2023-09-24 11:00 Hercules: Home filesystem back online.

2023-09-23 13:56 Hercules: Home filesystem unavailable preventing login.

2023-09-20 14:45 Lemaitre3: The BeeGFS global scratch /scratch is back online after replacement of the failing hardware

2023-09-20 13:20 Lemaitre3: The BeeGFS global scratch /scratch is currently unavailable.

2023-09-17 17:45 Lemaitre3 and gwceci.cism.ucl.ac.be: Network connectivity has been restored.

2023-09-16 16:45 Lemaitre3 and gwceci.cism.ucl.ac.be: UCLouvain HPC infra inaccessible due to a networking issue.

2023-09-05 16:04 Hercules2: workaround implemented to mitigate the slowdowns

2023-09-05 16:04 Hercules2: Cluster stability issues detected due to defective network device

2023-08-10 11:08 NIC5: NIC5 is up and running again

2023-08-10 09:00 NIC5: Login node memory replacement and reboot

2023-08-06 16:04 NIC5: Hardware memory problem on login node detected

Legend

Everything is running as expected.

The system status is degraded. Some functionalities might be missing, or less performant.

The system is unavailable ; we are working to make it functional again.

The system is undergoing planned maintenance operations.

The system is not maintained anymore.

Beginning of the event/issue

Resolution of the event/issue

Information and status update

Future announcements and "save the date" info

© CÉCI.