Bereichsnavigation

RIGI: An Introduction

System Overview

For more information please refer to the CSCS User Web Portal

Accessing the System

The cluster OS is based on a CentOS (V4.6) Linux distribution.

  • % ssh -X <username>@rigi.cscs.ch

  • sftp, ncftp are also awailable.

User Environment Setup

User environment on most of CSCS computing resources is defined via the the module command mechanism. Unfortunately this approach introduce some complexity in the setup of a correct programming and linking environment. However there are several important advantages in favor of this approach, in particular:

  • Independence/Stability toward new releases of compilers or libraries. This is particularly true whenever the validation of the software is not trivial, or if the application need to be reliable or stable over time.

  • Flexibility: Possibility to keep/use different releases of compilers, libraries or tools.

Filesystems

The following file systems could be of some interest for Rigi users. Since I/O operations might be quite important and since the I/O speed might be quite different from one FS to another, the fine tuning of the FS used might sometime have important benefits on the overall job execution time.

Summary of Available FS
FS size quota speed backup NOTES

/users/$USER1)

~200GB,

~10GB,

+

daily

Users home data GPFS shared

/local/scratch2)

~200GB,

none,

++

none

Local disk to each node

/scratch

~13TB,

none,

+

none

NFS shared from each node

/project

~42TB,

~10TB x project

++

none

GPFS shared CSCS wide

/archive

unlimited3)

~10TB x project

=

daily

NFS shared CSCS wide

1) Home data are shared between several CSCS systems: i.e. archive/bar/rigi/Horus.

2) Useful as tmp FS for any serial job, need to be cleaned before the end of the job.

3) Size of the archive is following the data archived growth.

Cleaning policies

For the time being no cleaning policies are enforced, however this might be introduced later if needed.

Working with the archive

Being the CSCS archive based on tapes a correct usage of the archive might help improving jobs throughput. Basic documentation about the CSCS Archive might be found on the CSCS Archive User Guide.

Interactive usage

Important

While tolerating the interactive usage of the cluster for small test and for development purposes, we would like to kindly suggest you to take a few easy measures in order to reduce to a minimum the impact of such a jobs to the other users…

They are basically 2 ways for helping in this issue:

  1. Lowering the script priority using the nice command: (ideal for testing purposes):
    % nice +18 <full-script-command>

  2. Requesting an interactive shell on a separate node via the PBS command qsub -I: (ideal for running jobs which frequently requires interaction):

    • Interactive job using up to 256MB:
      % qsub -I -l select=1:ncpus=1:mem=256mb

    • Interactive job using the X interface:
      % qsub -I -V -l select=1:ncpus=1:mem=256mb

    • Interactive job on a specific host:
      % qsub -I -l select=1:ncpus=1:mem=256mb:host=compute-0-25

You should get a shell prompt on one of the nodes and an entire CPU for yourself:

  % qstat -a -u $USER
  Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
  --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
  49620.rigi.cscs <$USER>  express  STDIN       20437   1   1  256mb 02:00 R 00:00
Warning
  • Avoid letting a CPU reserved if not used.

  • Pay attention to the amount of requested memory (the script will fail if the amount is too small).

Bach Queuing System

The batch queuing system is based on PBS Pro. Additional info might be found on the CSCS Users Web Portal

MyJob.sh
  #!/bin/bash
  #PBS -N MyJob-12
  #PBS -m abe
  #PBS -M Your_Email@Address
  #PBS -l select=4:ncpus=1:mpiprocs=1:mem=512mb
  #PBS -l walltime=01:00:00
  #PBS -q feed@rigi.cscs.ch
  #PBS -r n
  #======START=====
  echo start

  WORKDIR="$HOME/MvapicExample"
  echo "- WORKDIR $WORKDIR"
  cd $WORKDIR

  echo "- Current PBS_NODEFILE:"
  echo $PBS_NODEFILE
  cat  $PBS_NODEFILE

  echo "- which mpirun:"
  which mpirun

  CMD="mpirun -np 4 -hostfile $PBS_NODEFILE $WORKDIR/hello_world"
  echo "- Executing: $CMD"
  $CMD

  echo stop
  #=====END=====
Submitting the job and checking the job status:
  % make hello_world
  % qsub job_example.sh
  % qstat -a
  % qstat -a -u $USER

Special needs: Large memory request (max available)

Let’s assume a job needing all the available memory on a node. You should require the max available memory and ensure to be the only job on the node.

Warning
  • ~0.5GB should be left for the OS and FS buffer caches.

Job header keywords (assuming a node with 4CPU/8GB RAM)
  #PBS -l select=1:ncpus=1:mem=7500MB
  #PBS -l place=scatter:excl

Special needs: All the processors of a node

Let’s assume a job needing all the processors of a node.

Job header keywords (assuming a node with 4CPU/8GB RAM)
  #PBS -l select=1:ncpus=4:mem=512MB
  #PBS -l place=pack

Special needs: No email notification

Job header keywords to be removed
  #PBS -m abe
  #PBS -M Your_Email@Address

PBS Pro User Guide

For more information please refer to the PBS Pro User Guide

Development Environment, Scientific Libraries and Applications

Basic documentation on the Development Environment, Scientific Libraries and Applications available on Rigi might be found on the CSCS User Web Portal.

Parallel execution

Basic documentation with examples of using mpi on Rigi might be found on the CSCS User Web Portal under Configuring and Using Mvapich or Configuring and Using HPmpi.