Survey 2014: Summary of findings

From June 4th to June 22nd, 2014, CÉCI users were invited to respond to an anonymous satisfaction survey.

The main questions were:

How did you learn about us ?
Was it easy to create an account, to connect ?
What do you need ? (hardware/software/policy)
What is your typical job ?
What would your dream job be ?

The form ended with a free text field where users could leave suggestions or remarks.

Nearly 75 users responded to the survey, out of the approximately 350 active on the clusters earlier that year. They originated from all CÉCI universities, with very diverse research interests. Half of them are Linux users, the other half being Windows users or Mac users

The present document offers a summary of all comments and suggestions made in the responses. This other document offers a synthetic view of the responses.

Accounts & Connecting
Documentation
Storage and file access
Software
Job scheduling

Connecting to the CÉCI clusters

Some respondents complained that it is a limitation not being able to connect from 'outside'

This matter has been discussed last year and the position of the CÉCI about this security matter has not changed. It is even reinforced by the latests issues discovered with SSL and Bash. None of the CÉCI clusters were compromised by the exploits using those vulnerabilities thanks to the fact that they are not accessible from 'outside'. Last year, an effort was made so that each university site offers a tutorial on how to connect from outside, but it appears the information is too scattered.

Consequently, we will:

Gather instructions for each university on one page of the ceci-hpc.be website

Some respondents suggested that we offer instructions of how to use the keys in the emails sent with the keys.

Those emails actually have a link pointing towards the corresponding question in the F.A.Q. but from there, the user must click on the tutorial corresponding to the operating system it uses. That may not be clear.

Consequently, we will:

Add a sentence explicitely suggesting the user to follow the links for instructions on how to install the keys

One respondent complained that it was not easy to transition from one university to the other.

This was due to the fact that the account is linked to the email address of the users, and that email address changes when moving from one university to another. A manual operation from a system administrator was needed on the database to allow moving users to keep their login.

Since then, the software has been upgraded, and users have the ability to change their reference email using the web interface.

Documentation

One respondent complained that the proposed example scripts were poorly commented and unclear

The respondent complained that the scripts were different from one cluster to another and that no clear syntax is provided to write a custom script.

Let us recall that submission scripts are shell scripts and they obey the syntax of the chosen shell.

Since then, the Slurm tutorial available on the website offers many submission scripts for different uses, and a submission script wizard has been developed to help coping with the Slurm options and the differences in the cluster.

One respondent suggested that more information about the basics of cluster computing be offered

The website assumes a basic knowledge of cluster computing and notions like scratch space, queues, etc. are assumed known. This makes it difficult for an unexperienced user to 'jump in'.

Consequently, we will:

Try to add a help section for beginners in the website
Review the basics that are currently detailed in the training sessions

Storage and file access

Many respondents requested a common central place to store their data

The idea is to make files from one cluster available to the other clusters, so the users have a common environment on all clusters.

The CÉCI is currently working towards that but that requires heavy investments in terms of network connectivity, hardware and software to have a solution that is efficient and reliable with no single point of failure. Funding has been obtained and will be actioned in 2015.

Several respondents requested backups

Backups of the outputs of the jobs and/or the home directories might be available at each site, but there is no goal to provide systematic backup for the users. The reason is that no long-term storage is installed on the CÉCI clusters and using the fast Lustre or Fraunhofer filesystems for backup is a blatant waste of resources. Each university might offer a solution for on-site backup, but given that the hardware must be paid for, it may not be free.

Consequently, we will:

Write documentation on how to get storage space for backups at each university
Write documentaiton on how to implement backup

One respondent asked how to open the files in Nautilus

Nautilus is Gnome's file manager. To use a graphical file manager you can use the SSHFS tool to mount your home directory from the cluster locally on your laptop. That tool is maybe not widely known from the users.

Consequently, we will:

Write documentation on how to use SSHFS

One respondent complained that distinct quota limitations on the clusters can lead to surprises

The reason why quota are set on the home directories is that the storage space must be fairly shared among the users. If a job necessitates more space than the quota allows, then it should be run on the scratch filesystem and not in the home filesystem. No quota is set on the scratch filesystem but you must clean it after your job is done.

Two clusters, Vega and Hercules, have been equipped with very large storage systems so the system administrators have decide not to implement quotas. The others typically have a 20GB limit. That information is available on the cluster page of the CÉCI website.

Consequently, we will:

Make sure the message of the day reminds the users of the quota limitations on the cluster

Software

One respondent complained that the installed software base should be larger and modules should be common to all clusters

Having identical modules on all clusters is something we are working towards, and at the same time, we have the constraint to be as backward compatible as possible so that former submission scripts continue working. The integration of all software is something that takes time and that should improve when the common filesystem is installed.

As always, if one software is missing on a cluster, do not hesitate to contact the local administrators to have that software installed.

Consequently, we will:

Keep on working towards a uniform working environment for the users

One respondent requested Easybuild to be installed on the clusters

Actually many CÉCI clusters have Easybuild installed, but not always in a user-visible way. Easybuild is not yet a mature enough software, but we are in contact with its developers and use it when it is efficient to do so.

One respondent requested that a Map-Reduce framework be available on the clusters

The Slurm developer are working towards integrating a map-reduce framework in Slurm, but in the meantime, we cannot install it globally on a cluster without interfering with the Slurm install. Note that a Map-Reduce framework can be implemented with MPI, and see also this article.

Job scheduling

Many respondents requested that the job array functionality be available on all clusters

Slurm offers job arrays only from version 2.6.0 on. At the time of writing, versions prior to 2.6.0 are installed on Vega, NIC4 and Dragon1. The one obstacle that prevents installing the latest version of Slurm on all cluster is that the clusters that are managed by the Bright Cluster Manager software (the software that handles all aspects of the cluster: deployment, provisioning, monitoring, etc.) cannot use another version than is shipped with Bright Cluster Manager without breaching their support contract. We are talking with the representatives of Bright to find an agreement.

Note, however, that job arrays might not fit with the preferred usage of the cluster, especially for NIC4.

Consequently, we will:

Keep on working towards upgrading the Slurm version on all clusters

Many complained about the long waiting time in the queue

The clusters are shared among 350 users who all want their job to start immediately. That cannot happen given the available resources. The only way to get everyone's job to start soon is to enforce a large turnover by reducing the maximum run time. This is what is done on Lemaitre2 and NIC4. But for the clusters where week-long jobs are allowed, turnover is very small and waiting time in the queue is very large as a direct consequence.

Make sure also to estimated the resources usage (running time, memory) as precisely as possible ; leaving the default value for the --time parameter will prevent your job from starting as soon as it could because Slurm will not be able to backfill them. Requesting full nodes is also a strategy that leads to longer waiting time. For MPI jobs make sure to let Slurm scatter at will your processes; in most cases, the time lost because of the higher communication latencies is smaller than the time gained thanks to Slurm being able to schedule the job sooner. For shared-memory jobs, requesting 2 cores less than available, and leaving a proportional amount of memory often allows jobs to start much sooner because potentially, the number of jobs that must finish to free resources for you job is much lower.

Some complained about the time limits enforced on the clusters

The clusters are shared among 350 users who all want their job to run infinitely. That cannot happen given the available resources. The only way to get everyone's job to run for a long time is to force very long waiting time for the other jobs. This is what is done on Hercules and Dragon1. But for the clusters where turnover is very large and waiting time is reduced by design, the maximum allowed running time for a job cannot be large.

Some respondents complained that sometimes the queue is filled by jobs from one single user

The priority on the clusters is governed by the fairshare, i.e. the past usage of the cluster compared to the number of other users. When one user submits a large number of jobs, the priority of those jobs that are pending decreases as the jobs that are running consume CPU hours. Don't forget that 100% of the jobs that ran on the cluster have been submitted. Submitting a job and then cancelling it because it does not start soon enough, or not submitting a job because the output of squeue is lengthy, is a strategy that leads nowhere.

Some respondents requested more information on how priorities are computed.

The priority system is described in the document linked from Question 1.4 How are computing resources used and shared? of the F.A.Q.

One respondent requested the information of the load of the clusters to be available on the website

For now, that is not something that is available through the website. To get an overview of the load of the clusters, you can simply SSH to the machine and use the sinfo command, or the sload command.

Note however, that the load of a cluster is a very poor indicator of how soon your job could start if submitted on that cluster. The load does not say anything about how soon resources will be freed by finishing jobs, it does not say how many jobs in the queue have higher priority than your job, it does not say how the free slots are distributed and it does not say how much memory is available.

You could be reading a load of 97% with 10 CPUS left and still have your 128-cpus job start within seconds because a 256-cpus job is about to finish. Or you could be reading a load of 50% but have your job still be queued for a long time because for instance 95% of the memory is being used.

The only thing you can infer from the load is that if it is lower than 5%, chances are that your job will start soon. But the load is never lower than 5%.

The only way to know for sure is to submit it and let it wait in the queue.

One respondent suggested that the load of the cluster be displayed to the user at login time

That information is available on some clusters already, but not on all of them.

Consequently, we will:

Keep on working towards a uniform working environment for the users

One respondent requested that the summary email contains accounting information about the job when it is finished

Such an email is available on some clusters already, but not on all of them.

Consequently, we will:

Keep on working towards a uniform working environment for the users

Survey 2014: Summary of findings

Contents

Connecting to the CÉCI clusters

Some respondents complained that it is a limitation not being able to connect from 'outside'

Some respondents suggested that we offer instructions of how to use the keys in the emails sent with the keys.

One respondent complained that it was not easy to transition from one university to the other.

Documentation

One respondent complained that the proposed example scripts were poorly commented and unclear

One respondent suggested that more information about the basics of cluster computing be offered

Storage and file access

Many respondents requested a common central place to store their data

Several respondents requested backups

One respondent asked how to open the files in Nautilus

One respondent complained that distinct quota limitations on the clusters can lead to surprises

Software

One respondent complained that the installed software base should be larger and modules should be common to all clusters

One respondent requested Easybuild to be installed on the clusters

One respondent requested that a Map-Reduce framework be available on the clusters

Job scheduling

Many respondents requested that the job array functionality be available on all clusters

Many complained about the long waiting time in the queue

Some complained about the time limits enforced on the clusters

Some respondents complained that sometimes the queue is filled by jobs from one single user

Some respondents requested more information on how priorities are computed.

One respondent requested the information of the load of the clusters to be available on the website

One respondent suggested that the load of the cluster be displayed to the user at login time

One respondent requested that the summary email contains accounting information about the job when it is finished