Survey 2017: Summary of findings

From June 16th to July 30th, 2017, CÉCI users were invited to respond to an anonymous satisfaction survey.

The main questions were:

How did you learn about us ?
Was it easy to create an account, to connect ?
In how many publications have you already acknowledged the use of CÉCI clusters ?
What was the main problem you faced when you renewed your account?
How interesting to you was the reply to the previous survey posted on the CÉCI ?
What do you need ? (hardware/software/policy)
What is your typical job ?
What would your dream job be ?

The form ended with a free text field where users could leave suggestions or remarks. We got 22 comments, questions or suggestions and those who left an email address were contacted.

More than 80 users responded to the survey, out of approximately 490 users active in the past few months. They originated from all CÉCI universities, with very diverse research interests in science and engineering. We thank those users for the time they took to fill the survey.

The present document offers a summary of all comments and suggestions made in the responses. This other document offers a synthetic view of the responses.

Acknowledgement in publications
Documentation
Common storage
Support
Resources
Job scheduling

Acknowledgement in publications

This is the second time we made this question in the survey. More than half of the respondents have cited the CÉCI at least in one publication with a total of 198 acknowledgements. That is a 30% increase respect to last year total amount.

The acknowledgments is the most direct way to show the utility of clusters. These testimonials are very useful for getting funding to ensure the project continuity and to give access to computing power to researchers.

Documentation

Some respondents complained about the explanations on the FAQ page being insufficient or too long

The documentation is seen as not rich enough by some, and too long by others. The CÉCI documentation is usually written according to the so-called inverted pyramid principle use in journalism so that the most important information is at the top of the articles and the details are at the bottom so the reader can decide when to stop reading.

We are permanently working on improving and extending the support documentation and since the beginning of this year there is available a new CÉCI Support website.

This new CÉCI documentation website contains all the previous technical information on the FAQ with more details and a better structure to organize the different topics covered. The website also supports searching for keywords through all the pages, notice the Search docs box on top of the left frame. Try performing a search for 'ssh'.

Some of the Window's users complained about the difficulty to setup the ssh client

The problems mentioned were about converting the CECI SSH private key to the correct format, or dealing with many different applications to be able to setup the login environment.

We are aware of these difficulties to work on a Windows environment and we have been working on an alternative solution to the previous Xming+Putty+WinSCP suite. We now provide in the support page a detailed guide for configuring on Windows the MobaXterm application.

We suggest all Windows users to give it a try, since this single tool provides a complete environment for connecting to the clusters and also copying/retrieving files to/from them. In addition, no conversion is required to use the CÉCI private key that you get by email.

One respondent mentioned that finding information about 'lm9' was not easy

There is no lm9 cluster being part of the CÉCI, so it is expected not to find any information about it on the CÉCI documentation. The list of machines the CÉCI is responsible for can be found on the CÉCI clusters page.

Several respondents asked for some training, support and tutorial for new users.

The CÉCI organizes each year around October a training session held at the UCL for all users specially for beginners. The FAQ, and tutorials referred-to there in, are oriented to help users during their first steps as a cluster user.

Common storage

This year we finally started supporting a common storage system that is visible to all compute nodes of all clusters. Several users expressed their satisfaction for having this new option available which they are already making use of.

Some users were not aware of the common storage feature.

We cover in the CÉCI support web a detailed explanation about the common storage solution usage.

In addition, on this year CÉCI training session Introduction to data storage and access will be covered explicitly how to make use of the common storage.

To summarize, the common space can be accessed from any of the CÉCI universities clusters through the $CECIHOME environment variable, you can list its contents with the command

ls $CECIHOME

In the long term, this filesystem will be the default home on the clusters, but at the moment, you will need to copy files there with the cp command explicitly.

There is a 100GB quota on the main storage, to get your current usage use the quota command from one of the clusters. If this command does not list the central storage, try first listing its contents with ls, it should appear afterwards.

We refer again to the documentation to read about all the details and to learn about the extra common space for fast transfers $CECITRSF between the scratch partitions of the clusters.

One user mentioned that moving files between clusters with scp is more efficient than copying to the common space

While this might be valid for small files, this is certainly not the case for big files ~1 TB when using the $CECITRSF space.

One respondent mentioned not being able to use the common storage due to a buggy 'quota exceeded' error

The CÉCI common storage is a pretty complicated system with many parameters and degrees of freedom (read carefully the answer below for more details). Especially quota are complicated to set due to the asynchronous nature of the replication. We have had to adapt the quota a few times in the early weeks to cope with problematic situations. Now, it should be more stable in that respect. Thanks everyone for their patience.

One user mentioned to be confused due to different information obtained with 'du -sh' for the same file on different clusters

This is a very interesting point, as different results with du -sh can be obtained not only on a complex solution as the CÉCI common storage but also for the same files stored on standard partitions having different file systems.

The solution implemented for the CÉCI common filesystem is based on the GPFS cluster filesystem developed by IBM. In addition, to keep the data on the common partitions synchronized among the different geographical locations of the CÉCI clusters, it makes use of the Active File Management (AFM) implementation. Within this setup a global namespace is defined to share a single filesystem among the different gateway nodes which are physically deployed on each of the CÉCI universities.

When copying, for example, a 6MB file from your nic4 home folder to $CECIHOME and perform a du -sh you will obtain:

$ cp file_6MB.dat $CECIHOME/
$ du -sh $CECIHOME/file_6MB.dat
6.0M   $CECIHOME/file_6MB.dat

In order to avoid the inherent problems due to wide-area interconnects latencies, the rest of the gateways on the AFM setup, are served only the metadata of the files created on one of them. That is to say, your file_6MB.dat file will be actually copied, synced and be stored on both main storages at ULiège and UCL, but the rest of the gateways will only have the information that the file exist on the common space, the actual contents will be transferred only on demand.

Then, if after the previous steps on nic4 you login on dragon1 and perform du -sh you will see:

$ du -sh $CECIHOME/file_6MB.dat
0     $CECIHOME/file_6MB.dat

If in dragon1 you access the file, i.e. open with an editor, do a cat or less, etc. then the actual data contained on the file will be transferred from one of the main storages to the gateway, afterwards you should get:

$ du -sh $CECIHOME/file_6MB.dat
6.0M   $CECIHOME/file_6MB.dat

To summarize, you should never rely on du -sh to verify if a file is properly stored or copied, and this is valid in general for any kind of filesystem. A more appropriate action would be to check for file consistency with a hash tool and verify to get the same output on different gateways:

$ md5sum $CECIHOME/file_6MB.dat
da6a0d097e307ac52ed9b4ad551801fc  $CECIHOME/file_6MB.dat

In the case that, for some reason you want to know which is the approximate space taken by a file or directory on $CECIHOME then add the --apparent-size option to du:

$ du -sh --apparent-size $CECIHOME/file_6MB.dat
6.0M   $CECIHOME/file_6MB.dat

the output should be closely the same on all the clusters, independently if files were accessed or not. Notice that the man page for du defines the tool as du - estimate file space usage.

If you want to know which is your current space used on the $CECIHOME area you should always do so with the quota command. The output must return the same information for your usage on /CECI/gateway/home on all the clusters. In case it does not, please submit a ticket on the CÉCI Support page.

Support

Some users complained about their jobs running or in queue being killed

Jobs are killed by a sysadmin only when they represent a potential problem for keeping the cluster running. They can also be killed automatically due to entering some error state.

In any case, when there is an issue of this kind, please contact the local system administrator of the cluster where your job was killed immediately to understand why that action was required.

It might not be your fault, but in case it was, is important to understand what happened to avoid falling into that problem again.

One respondent mentioned that getting access to zenobe takes a very long time

Zenobe is a very large machine (Tier-1 level, as opposed to Tier-2 level for the CÉCI clusters) so getting access to it requires a few administrative steps like submitting a project. This is a decision that was made by the funding agency in coordination with the Vice-Rectors of our universities. But when a project already exists, adding a user to a project is done within 24 hours on average.

If you have no answers after that time frame, you can always contact again Cenaero, the sysadmin of your local university and the CÉCI logisticiens.

One respondent who moved between CÉCI universities mentioned there was no procedure to change emails when the old one has already expired

The procedure to change the email address indeed requires both the old and new addresses to be usable. As a measure to protect the logins from identity theft, in the event your old email address has expired, you need to contact the system administrators from your former and new universities to offer a proof that you have indeed changed emails.

One user complained about a required software installation taking too long

Users must keep in mind that some system administrators have many more duties others than taking care of the CÉCI clusters and are not backed up by a team. Thus, they must prioritise their tasks according to the impact the task has on the cluster usability. Feel free too to contact them by phone to have a more precise answer if you feel you have been waiting for too long.

One respondent complained about receiving an inappropriate response to simple questions

The CÉCI team gives a high priority to maintain a useful and practical documentation about the clusters' usage and to explain how the different components work to make an efficient use of them. When a user has some questions about these topics, before contacting the system administrators, it is mandatory to go through the documentation to verify if they are not already answered there.

One respondent asked for Matlab on the clusters

A training session is dedicated to using Matlab on the clusters by means of the Matlab Compiler to avoid the licensing issue. You can take a look at the slides of previous year or still join for the 2017 edition.

Resources

Some respondent complained about the different configuration of libraries and modules between clusters

Uniforming the software modules is indeed something we are working towards. Now that we have the common storage installed we are closer to provide a solution of this kind. There is a CECI/Soft partition created that will be used to store all the modules and compiled software to provide an homogeneous configuration on all the clusters.

Some users required to have software updated more frequently

After going in the direction of uniforming the software modules, their maintenance will be centralized and that will help to keep the software stack up-to-date easily.

One particular user asked specifically for keeping gcc and gfortran updated more often, we will try to update the forthcoming CÉCI soft space at least once per year. But, in the case you need for some reason a specific version of a code or compiler, you can request that to the local system administrator or also follow the instructions on the documentation to compile software from sources on your own.

One user requested the possibility to compile Fortran CUDA codes

CUDA for Fortran is only supported by the PGI Fortran compiler. This compiler is bundled in a commercial suite and thus is not available on all the clusters. Among the CÉCI clusters which have GPUs available, Dragon1 from UMons is the only one with a recent version of the PGI compilers. By loading the module pgi/17.7 the enabled pgfortran compiler should be able to compile current versions of Fortran CUDA codes.

For running the job, remember to ask for the specific GPU generic resource by adding to the slurm script:

#SBATCH --gres="gpu:1"

One user asked for implementing a uniform greeting message on all the clusters

We are working on it but in the end, the final configuration of a cluster is the responsibility of the local team. Do not hesitate to ask them directly if there is some specific info you find useful to be shown.

One respondent asked for more resources to the frontends as sometimes they are clogged by users running many things on them

The only runs which are resource intensive that could take place on the frontends is compiling code from source. It might happen rarely that several compilation procedures could be taking place at the same time, thus making the frontend temporarily a bit unresponsive.

Other than that, batch scripts running on the frontends should be restricted for very basic tasks running in seconds. Otherwise some of the fast or postprocessing queues should be used to run them. If you detect some hourly long and heavy resource consuming batch script running on a frontend, please contact the local system administrator to see which action should be taken.

One respondent asked for scratch space on zenobe nodes for intensive I/O runs

We will transmit this request to the Tier-1 users' committee.

One user requested the possibility to deploy virtual machines on the clusters

This topic is being reviewed at the moment as it seems clear that HPC and Cloud are converging. At the moment, no infrastructure is available for that unfortunately. Lemaitre3, the next CÉCI cluster to come at the UCL, will have tools installed to run Linux containers in jobs, but not (yet) for virtual machines.

Some respondents asked for bigger and faster scratch storage up to 15TB

As part of the scheduled upgrades of the CÉCI clusters infrastructure, the next one being Lemaitre3 at the UCL during the beginning of 2018, will count with 600TB of very fast parallel scratch storage to use.

Scheduling

One respondent mentioned that a user monopolized half of lemaitre2 for two weeks and that balancing is better implemented on nic4

As lemaitre2 is often heavily loaded it is nearly impossible for a single user to grasp a large portion of the cluster. The rare situation of a single user occupying half of lemaitre2 could take place only for instance after a cluster restart, such as after a maintenance period.

In that case, this can last only for 3 days, which is the maximum running time allowed on lemaitre2, after that the fairshare of the user will drop and the priority of extra jobs will decrease substantially. In lemaitre2 we favour no limits to hard limits that would prevent jobs from starting when the cluster is not used

Nic4 is more balanced because since it is the cluster with the lowest maximum running time of all, it has the highest turnover.

One user requested to keep working on having a unique scheduler for all clusters

This is a work in progress and is something that we will start to implement and test as newer generations of clusters start rolling on the CÉCI.

Several respondent asked for longer time limits for the jobs.

This is a question that is often asked. Some of the clusters are configured to favour jobs that scale well, i.e. where you can trade job wall time for number of CPUs, because a lot of money has been spent in a very fast interconnect (Infiniband).

Users must also take into account the fact that long jobs are incompatible with short waiting times. One user requested to set the wall time to less than 20h to increase turnover. The current times are chosen to accommodate at best the very different requirements of all CÉCI users' in the section about max wall time.

Users are encouraged to try and use checkpointing software such as http://dmtcp.sourceforge.net. A training session is dedicated to it for CÉCI users. This year will not be organized but you can take a look at the slides available online.