Survey 2020: Summary of findings

From November 30th to December 21, 2020, CÉCI users were invited to respond to an anonymous satisfaction survey.

More than 92 users responded to the survey, out of approximately 693 users active in the past few months. They originated from all CÉCI universities, with very diverse research interests in science and engineering. We thank those users for the time they took to fill the survey.

The text below offers a summary of all comments and suggestions made in the responses. This other document offers a synthetic view of the responses.

Support and Documentation

Several respondents indicated that sometimes the documentation is not up to date. Some other mentioned that some topics are missing from it. One respondent stated that the VSC documentation is much better.

It is true the documentation is sometimes out of date with respect to the actual current, state of the clusters. We have an ongoing project aiming at automating the update of the documentation as far as cluster configurations are concerned. Currently, for instance, the list of software modules installed on the clusters is built automatically from the module lists. It is only a first step, but this project does not have highest priority for the moment. As for the topics that are covered in the documentation, our policy is to focus mainly on information that is specific to the CÉCI infrastructure, and on information that is important but difficult to find online. We will not rewrite a full Slurm documentation for instance, but we try to be very detailed about the process for creating and account and the first SSH connection. Of course, the documentation can always be improved, do not hesitate to contact your local CÉCI admin if you have suggestions.

One respondent reported that the support teams are sometimes slow to respond to questions.

It often happens that sysadmin must prioritize their tasks because time is a rare thing for everybody. Tasks that impact a lot of users (e.g. replacing a failing hard disk on time) are prioritized over those that will help only one user (e.g. installing a very specific piece of software). Also, it is important to help the system administrators help you. Please provide as much information as you can about your issue. Most importantly, be explicit about your login name, the cluster you are referring to, the error message you are faced with, etc. The more information you give, the less time the admins must spend discovering it, and the sooner they can answer you. For that matter, using the support form is always a good idea.

Hardware

One respondent requested that we offer hardware more similar to what is available on Lumi

That is indeed something we are concerned with, but it is very difficult to do in practice due to the regulations related to public markets. Because of those regulations, the information about the type of hardware installed on Lumi was kept as private for a long time and available only for directly involved parties. Furthermore, also due to the regulations, we cannot directly buy the same hardware and have to publish an open call for tender. However, the University of Antwerp will make available AMD MI100 for development purposes. Access may be granted to CÉCI users upon request in the future.

Scheduling

One respondent asked to have the details of the queue priority algorithm made available to all users.

We already have a page dedicated to priorities in the documentation. It contains the information needed to understand the linked pages of the Slurm documentation that explain all with many details.

Several respondents requested that more jobs could be submitted at once, and larger job arrays be allowed.

There aren't many restrictions on the clusters related to limiting the number of jobs that can be submitted, but everywhere it is in place, it is aiming at making access to resources as fair as possible for all users and/or discouraging the use of clusters for usages for which they were not intended. Limitations on the number of jobs of the same user running at the same time follow the same logic, and are necessary especially on the clusters where the maximum wall time is large.

Multiple users complained about having to wait too long in the queues.

The waiting time is inversely proportional to the amount of resource available and the maximum wall time. So to reduce the waiting time, make sure to acknowledge the use of CECI infrastructure in your publication so we can convince the funding authorities to allocate budget for cluster replacement/enhancements.

Also make sure to use debug partitions were available for short tests. On the debug partition, the wall time is very short, and, consequently, the waiting time is, too.

Some clusters also have floating reservations that act as a virtual debug queue and enable short jobs to start very soon by making sure a subset or resources is always available for short jobs.

Multiple respondent complained that the maximum wall time is too small

As discussed numerous times, short maximum wall times are necessary to reduce waiting time and ensure fair resource sharing.

Containers

One respondent asked if it was possible to install multiple versions of Singularity on the cluster

Singularity uses low-level system features and several of its upgrades involve security related fixes. After we considered an upgrade was required on the clusters, we cannot keep on the system old versions with known and disclosed security bugs.

One respondent regretted that Singularity was not available on all clusters

Singularity is currently installed on all clusters where the operating system makes it possible. So currently only NIC4 does not have it installed.

One respondent requested that users be able to build Singularity containers on the clusters

Enabling the users to build Singularity containers by themselves on the clusters requires some sort of privilege elevation that is difficult to control precisely and has, in the past, introduced security issues within Singularity. The fakeroot feature can be an option, but setting it up on multiple clusters with a central user management is vastly different from setting it up on a personal laptop and making the configuration automatic is not a trivial task. We will investigate this.

Multiple respondent explained that the major roadblock for them is making their software more adapted to clusters (parallel, checkpoint-able, etc.)

The clusters are indeed designed in a way that software must take into account to fully benefit from the computational power and optimize the fair sharing of the machine. Do not hesitate to consult the system administrators for hints and clues on how to adapt your software if needed.

One respondent asked how to know when an interactive job would start

You can get the expected start time for your jobs with

squeue --start --user $USER

and of course you can use the --mail-type and --mail-user options to be notified by email when the job starts. But please note that the information given by Slurm is only an estimation based on the current state of the cluster. It could be sooner if for instance a job finishes much sooner than the requested job duration, and it could be later if a higher-priority job (for instance by a user who never used the cluster before) is submitted in the meantime.

The --begin option can also be useful for interactive jobs to make sure for instance that a job that you submit in the evening does not start before the morning of the next day. That option can be modified with scontrol once the job is submitted.

Make sure also to use a terminal multiplexer such as tmux or screen to submit interactive jobs from the frontend rather than from your laptop to prevent your job for cancelling if you lose the connection to the clusters (Wi-Fi issue, laptop put to sleep, etc.)

One respondent explained that they do not know in advance how much RAM or how much time a given job will necessitate

Unfortunately specifying resources is what it takes to ensure fair sharing of said resources and is all in the hands of the users. Making sure your job is checkpoint/restart-able is important. Analysing the accounting information given by Slurm in terms of elapsed time and used memory is also important to better specify the resources for future similar jobs.

One respondent explained that copying folders and files with MobaXterm was not easy

MobaXterm indeed has a small file browser window where you can drag and drop folders. It is very convenient to copy a file or a directory to the clusters, but not that suited to synchronising files that you frequently modify on your laptop.

The MobaXterm file browser, though, is not the only solution you can use to copy files to the clusters. You can use MobaXterm to start a local Bash shell and then use the tools available in Linux, such as rsync to your advantage. You can also install other tools such as FileZilla, that offers more options for transferring files, and you can use a command-line text editor such as vim, emacs, nano, etc. on the cluster directly to modify small files.

One respondent complained that moving large chunk of data from group directory to global scratch and back took a lot of time and energy

Yes, properly managing data requires planning and attention, but that enables optimal use of the resources. That being said, if you have multiple jobs reading the same data from the global scratch, it makes sense to keep that data on the scratch for the whole duration of the job campaign. And if you have one job that will be reading input data sequentially, it makes sense to read that file directly from the group directory.

One respondent mentioned the availability of GPUs as a limiting factor

Vega’s future replacement is expected to offer multiple GPUs, following the survey, which was conducted a couple years ago. But unfortunately, currently no funding is available for the renewing of Vega. We are working on it with our funding authorities (FNRS and Walloon Region)

Job policies

One respondent stated that if there were machines available on which one could work without the need to go through a scheduler, their scientific productivity would increase.

The job scheduler does indeed prevent you from starting your job when you want. But it then enables you exclusive access to the resources you requested. By contrast, an interactive machine that does not run a job scheduler, will let you start whenever you want. But as soon as you are not alone using it, you have side effects and conflicts: one job is using all the memory, some processes that have been abandoned are still using CPUs, one user is monopolizing all the resources, etc. As soon as you need to share a machine, this happens and the solution is the job scheduler. And if you do not want to share, you need to buy and administer your own hardware.

Multiple respondents complained about the long waiting time for jobs to start.

This is a recurrent topic and the answer will be the same as previous years. The best way to reduce the waiting time is to add more resources. That is something on which users have more leverage than system administrators. You can lobby your academic supervisor to use their influence together with the CÉCI Bureau to get more budget from the different funding agencies.

Assuming non-extensible resources, Queueing Theory teaches us the only way to reduce the waiting time is to have a higher turnover, meaning a shorter maximum wall time. But then again other users will be unhappy because their jobs cannot fit into the maximum wall time.

From a user point of view, always keep in mind that the tighter the resource you request the sooner your job is likely to be scheduled, and do not forget to look for 'debug' partitions when they exist. Waiting time on those partitions is often much lower. (Because the maximum wall time is much lower).

One respondent requested a more direct connection to compute

The compute nodes are organised into a local, private, network and have no direct inbound access outside of the frontend/login nodes. This is standard practice both for technical and security reasons.

That does not mean that it is not possible to access compute nodes from your laptop to run interactive software for instance ; SSH tunnels enable that. See this documentation for more details.

Software

One respondent suggested that all modules be compiled with the same compiler (for Fortran for instance)

The modules installed on the clusters are organised into "releases", and all modules related to the same release are built with the exact same compiler. See more information in the documentation

One respondent complained that Matlab is not installed on the CÉCI clusters

Please refer to the responses of previous years for further details, but in short, commercial licenses are not shared among universities and obey purely university-wise policies.

One respondent requested to have latest versions of OpenMPI and GCC on Nic4

Nic4 is an old cluster and, until recently, efforts were geared towards making Nic5 enter production rather than updating the software on Nic4. What's more, it can be very difficult to build an up-to-date version of the software on an old software infrastructure without reinstalling the whole cluster from scratch.

Also do not forget that you can install such software by yourself in your home directory.

One respondent complained about the Vasp install on Lemaitre3 and Manneabck (UCLouvain) clusters

VASP is indeed a piece of software that is a bit more difficult to include in a cluster-wise Easybuild setup and the UCLouvain team is working towards solving the remaining issues.

One respondent requested that the latest version of Abinit be installed on the CÉCI clusters

This is something we are working on and we have gathered with the Abinit maintainers to make sure Abinit is fully compatible with the Easybuild framework we use to deploy software on the clusters. But we are not monitoring the new releases of the scientific software installed on the cluster so if you need a more recent version, please notify the system administrator of the cluster you are using.

Storage

One respondent suggested that the CECI storage would be made available on the Tier-1 cluster

The replacement of Zenobe is expected to be connected to the CÉCI storage.

One respondent suggested that group leaders should be able to manage and customize access rights within a group

The options regarding access rights to the data on the disk are limited by the UNIX permission structure. Groups should be organised based on the needs in sharing/protecting files and not on an administrative structure (e.g. Department, etc.)

One respondent noticed hat sometimes software compiled on a cluster cannot be used on another

All cluster run the same operating system (CentOS) on the same type of CPU architecture (x86_64) so basic software will run anywhere. But indeed, the CPU micro-architectures are different, and software that was compiled optimally with respect to a certain micro-architecture can perform sub-optimally, or even fail, on a different micro-architecture. That is why, for the software that we provide, we make sure to compile them once for all CPU micro-architectures that are available. When compiling software that will run on multiple CPU micro-architectures, there are three options:

compile with the options that will allow running on any CPU, something you should only do for non-compute intensive software e.g. your favourite text editor
compile multiple versions and use environment variables to choose the correct version based on the current cluster
compile into so-called "fat" binaries that contain the code optimized for multiple CPU microarchitectures in the same executable.

One respondent complained that data were lost in the process of decommissioning Vega.

Data loss is a bad issue, and we take all preventive measures that are possible within the budget limits to avoid that. On a typical large-scale cluster such as NIC5 or Lemaitre3, buying the next level technology in high-availability for data storage for home directories for instance compared to what we choose right now would mean giving up possibly a quarter to a third of the compute node to stay within the same budget. For hardware that has done beyond their normal lifetime, we, furthermore, have no money anymore to replace failing parts. That is why we clearly state in all documentation that we cannot offer backups and the cluster may not be used a primary storage source. Data, code, configuration files, etc. must all be copied somewhere else on your workstation, on department hardware, on the cloud, etc.

Two respondents complained that downloading large data from the clusters is often complicated and never finishes

First of all, make sure you user rsync to download files as rsync makes it easy to resume interrupted downloads (both for directories with many files, and, with --inplace and --append for large files).

Secondly, it can be a good idea to move your data that resides in the global scratch on a remote cluster to the common CECI TRSF filesystem and then copy the data from the cluster of your own university. That way you will benefit from the fast inter-university network on which the common filesystem is built and the "last mile" will be "local" and less prone to network errors.

One respondent noted that the common filesystem is sometimes buggy

The common filesystem is a complex infrastructure that uses local caches and buffers to hide the latency induced by the wide-area spread of the filesystem. The drawback is that writing to the same file from multiple clusters can lead to problematic situations and should not be attempted. Also, the quotas are difficult to handle in the infrastructure, and users who go over quotas can prevent a local buffer from properly synchronizing with the central storage.

One respondent suggested that CÉCI offer common disk space for groups. Others suggested that the CECI home directories should have more disk space.

We will take this opportunity to remind everyone that your group can request group storage on the common filesystem.

But remember that temporary files should go to the scratch filesystems. See the documentaiton.

Tier-1 and Tier-0

One respondent asked that weekly reports for the Tier-1 usage be sent to project members rather than only to project principal investigator (PI).

The current reporting for the Tier-1 infrastructure was developed in-house by Ceanero and will be replaced along with the new Tier-1 infrastructure with a solution provided by the hardware vendor directly. We will pass that request on to the team in charge of the procurement.

One respondent expressed concerns with the latent move towards GPUs and the risk of not having enough CPUs anymore

The move towards GPUs is indeed difficult to avoid as more and more software nowadays is able to use the enormous power they offer. They also perform very well in benchmarks, which has the side effect that funding agencies tend to prefer them to give more visibility in the rankings. Though, at CÉCI, we still consider CPUs as the main provider of computing power in our infrastructure, and at the same time, we try to provide training and resources for those who need to make the move to GPUs.