Linux Cluster Tutorial

Prof. Dr. Matthias Gobbert, Dr. Stefan Kopecz

Connection to the cluster
Tutorial 1
Tutorial 2
Tutorial 3
Additional information

Connecting to the cluster

You can connect to the cluster using ssh as shown below. Instead of "kopecz" you have to use the user name of your UniAccount.

kopecz@euler[~]>> ssh kopecz@its-cs1.its.uni-kassel.de
kopecz@its-cs1.its.uni-kassel.de's password: 
Last login: Mon Feb 22 13:10:16 2016 from euler.mathematik.uni-kassel.de
#####################################################################################

            L I N U X   C L U S T E R   * * *   U N I   K A S S E L

*************************************************************************************

module avail 	-> zeigt alle verfuegbaren Module an
module list	-> zeigt alle aktivierten Module an
module load	-> laedt ein Modul
module unload   -> entlaedt ein Modul

*************************************************************************************

module avail	-> shows all available modules
module list	-> lists loaded modules
module load	-> loads a module
module unload	-> unloads a module

#####################################################################################

Currently Loaded Modulefiles:
  1) pgi/14.3                    7) jdk/1.8u65                 13) qepcad/B.1.69
  2) gcc/4.9.0                   8) R/2.15.1                   14) saclib/2.2.5
  3) intel/14.0.3                9) x10/2.1.2                  15) simplify/1.18
  4) Maple/18                   10) mpi/openmpi/1.8.1/gcc-4.9  16) sage/4.8
  5) Matlab/R2015b              11) polymake/2.9.9             17) CalculiX/2.5
  6) Mathematica/10.2           12) abaqus/6.11-3              18) nag/6.0
kopecz@its-cs1:/home/users/0026/kopecz>

If your login was successful, you should see a list of currently loaded modules. If the module "mpi/openmpi/1.8.1/gcc-4.9" is not listed in your case, you have to load it manually.

kopecz@its-cs1:/home/users/0026/kopecz> module load mpi/openmpi/1.8.1/gcc-4.9 
kopecz@its-cs1:/home/users/0026/kopecz>

If you are logging in for the first time, you should perform some basic checks. The command pwd prints the full filename of the current working directory.

kopecz@its-cs1:/home/fb17/kopecz> pwd
/home/fb17/kopecz

You can list the content of the current folder using ls. If you haven't created anything yet, your working directory should be empty.

kopecz@its-cs1:/home/fb17/kopecz> ls
kopecz@its-cs1:/home/fb17/kopecz>

The system's default shell can be displayed as follows.

kopecz@its-cs1:/home/fb17/kopecz> echo $SHELL
/bin/bash

Here "bash" is short for Bourne Shell.

Tutorial 1: Compile and run tutorial in serial

First of all we create a directory "hello_serial" and switch to this directory.

kopecz@its-cs1:/home/users/0026/kopecz> mkdir hello_serial
kopecz@its-cs1:/home/users/0026/kopecz> cd hello_serial

The sample code we will use is hello_serial.c. The program will print "Hello world from" together with the hostname. With the help of wget the code can be easily downloaded.

kopecz@its-cs1:/home/users/0026/kopecz/hello_serial> wget http://www.mathematik.uni-kassel.de/~kopecz/cluster/hello_serial.c
--2016-02-19 14:39:07--  http://www.mathematik.uni-kassel.de/~kopecz/cluster/hello_serial.c
Resolving www.mathematik.uni-kassel.de... 141.51.166.147
Connecting to www.mathematik.uni-kassel.de|141.51.166.147|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 184 [text/plain]
Saving to: “hello_serial.c”

100%[==============================================================>] 184         --.-K/s   in 0s      

2016-02-19 14:39:07 (30.0 MB/s) - “hello_serial.c” saved [184/184]

Having copied the file to the "hello_serial" directory, listing the directory's content should now show the c-file.

kopecz@its-cs1:/home/users/0026/kopecz/hello_serial> ls
hello_serial.c
kopecz@its-cs1:/home/users/0026/kopecz/hello_serial>

Compilation of the code

Now we can compile the code as follows. The option "-o hello_serial" tells the compiler to create an executable named "hello_serial".

kopecz@its-cs1:/home/users/0026/kopecz/hello_serial> gcc hello_serial.c -o hello_serial
kopecz@its-cs1:/home/users/0026/kopecz/hello_serial>

Executing ls once more, now also displays the executable.

kopecz@its-cs1:/home/users/0026/kopecz/hello_serial> ls
hello_serial  hello_serial.c
kopecz@its-cs1:/home/users/0026/kopecz/hello_serial>

Running the code

Now we're ready to run the code.

kopecz@its-cs1:/home/users/0026/kopecz/hello_serial> ./hello_serial 
Hello world from its-cs1.its.uni-kassel.de
kopecz@its-cs1:/home/users/0026/kopecz/hello_serial>

The output shows "Hello world from" and the hostname as expected.

Tutorial 2: Compile and run in parallel

First of all we create a directory "hello_parallel" and switch to this directory.

kopecz@its-cs1:/home/fb17/kopecz> mkdir hello_parallel
kopecz@its-cs1:/home/fb17/kopecz> cd hello_parallel

The sample code we will use is hello_parallel.c. The program will print the total number of processes in the job, together with the involved processor names and IDs. With the help of wget the code can be easily downloaded.

kopecz@its-cs1:/home/fb17/kopecz/trap> wget www.mathematik.uni-kassel.de/~kopecz/cluster/hello_parallel.c
--2012-04-17 15:10:51--  http://www.mathematik.uni-kassel.de/~kopecz/hello_parallel.c
Resolving www.mathematik.uni-kassel.de... 141.51.166.147
Connecting to www.mathematik.uni-kassel.de|141.51.166.147|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 484 [text/plain]
Saving to: "hello_parallel.c"

100%[=============================================================================================================>] 484         --.-K/s   in 0s      

2012-04-17 15:10:51 (107 MB/s) - "hello_parallel.c" saved [484/484]

Having copied the file to the "hello_parallel" directory, listing the directory's content should now show the c-file.

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> ls
hello_parallel.c
kopecz@its-cs1:/home/fb17/kopecz/hello_parallel>

Compilation of the code

Now we can compile the code as follows. The option "-o hello_parallel" tells the compiler to create an executable named "hello_parallel".

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> mpicc hello_parallel.c -o hello_parallel
kopecz@its-cs1:/home/fb17/kopecz/hello_parallel>

Executing ls once more, now also displays the executable.

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> ls
hello_parallel  hello_parallel.c
kopecz@its-cs1:/home/fb17/kopecz/hello_parallel>

The batch script

The batch system used on the cluster is SLURM. Before we can run the code in parallel using the sbatch command, we have to create a script which provides the information needed by sbatch to execute the code properly. We use the script mpihello.slurm, which is printed below.

#!/bin/sh                                                

# job name:                                                                 
#SBATCH --job-name=Hello_pa 
                                               
# write output and errors to files:                                     
#SBATCH --output=slurm.out                                                     
#SBATCH --error=slurm.err                                                    

# requested wall clock time in HH:MM:SS:         
#SBATCH --time=00:05:00                                                         

# requested memory per process in MB:                                           
#SBATCH --mem-per-cpu=100                                                      

# requested partition:                                                          
#SBATCH --partition=minijobs

# requested number of nodes and processes per node:         
#SBATCH --nodes=2                                           
#SBATCH --tasks-per-node=2                               

# use exclusive option only for performance studies!                      
#SBATCH --exclusive                                 

# start program:                                             
mpirun ./hello_parallel

Here is a table explaining the options used in this script as well as other valuable options.

job-name	The job name will be displayed in the NAME column of squeue's output. See the discussion of squeue below.
output	The program's output will be written to the file specified here. In this example it will be written to "slurm.out" in the current directory.
error	Any error messages will be written to this file. In this example the file is called "slurm.err".
time	Set a limit on the total run time of the job allocation. Here the used time format is "hours:minutes:seconds", so in this tutorial the job is expected to finish within 5 minutes. For other possible time formats see the manpage of sbatch.
mem-per-cpu	Minimum memory required per allocated CPU in Mega Bytes. Here we request 100 MB for each CPU.
partition	Request a specific partition for the resource allocation. Valid partition names for this cluster are "minijobs" and "public". See also the discussion of sinfo and the constraint option below
nodes	Request the specific number of nodes to be allocated for the job. Using the above script, two nodes will be allocated.
tasks-per-node	Specify the number of tasks to be launched per node. With this script two tasks will be launched per note.
exclusive	This option is used to guarantee exclusive access to the used nodes, which is indispensable for the creation of performance studies. It should be avoided for production runs as long as the job doesn't require a node's total memory.
constraint	This option is important when performing computations on the public partition. In this partition there are nodes of different architectures, more precisely there are nodes with 4 cores and 12 cores. To ensure that you only perform computations on nodes with 12 cores, add the --constraint=12cores option to your script. In the other case use --constraint=4cores.
exclude	From time to time one of the cluster's nodes will be broken and will cause jobs including this node to abort. In such a case (or for any other reason) you can define a set of nodes that will be excluded from allocation for the job. For instance, --exclude=its-cs102,its-cs110 will exclude the nodes its-cs102 and its-cs110.

Running the code

Now we are ready to run the code.

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> sbatch mpihello.slurm
Submitted batch job 424

After successful execution the two files "slurm.err" and "slurm.out" should have been created. Please wait another few seconds, if these files are not created immediately. If you still don't see these files, read the information on the squeue command given below.

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> ls
hello_parallel  hello_parallel.c  mpihello.slurm  slurm.err  slurm.out

We can display the content of these files using the cat command.

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> cat slurm.err
kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> cat slurm.out
Hello from process 002 out of 004, processor name its-cs101.its.uni-kassel.de
Hello from process 000 out of 004, processor name its-cs100.its.uni-kassel.de
Hello from process 001 out of 004, processor name its-cs100.its.uni-kassel.de
Hello from process 003 out of 004, processor name its-cs101.its.uni-kassel.de

Since no error occurred during the computation "slurm.err" is empty. The file "slurm.out" contains the output of our program. In the script "mpihello.slurm" we requested 2 nodes with 2 processes on each of them for the job. We see from the output that two processes were running on each of the nodes "its-cs100" and "its-cs101" as desired.

Attention: You may encounter error messages like "slurmd[its-cs193]: Unable to get current working directory: No such file or directory" in slurm.err. The ITS states that these error messages can be safely ignored.

Tutorial 3: A first project

In realistic projects the code will be contained in more than one file. There will be several files for functions, headers and makefiles that facilitate the compilation of the code. In this tutorial you should create a new directory

kopecz@its-cs1:/home/users/0026/kopecz> mkdir hello_send_recv
kopecz@its-cs1:/home/users/0026/kopecz> cd hello_send_recv/
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

and download the the source files nodes_used.c, nodes_used.h, hello_send_recv.c and the Makefile.

The code provides a helper function "log_processes" in "nodes_used.c", which can be used to log the names of the nodes running the computation. The program "hello_send_recv" will do so and create a file called "nodes_used.log" for this purpose. Using make the code can be easily compiled.

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> make
mpicc -g -O3   -c nodes_used.c -o nodes_used.o
mpicc -g -O3    -c -o hello_send_recv.o hello_send_recv.c
mpicc -g -O3   nodes_used.o hello_send_recv.o -o hello_send_recv -lm 
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

Listing the content of the directory shows that two object files (*.o) were created, as well as the executable "hello_send_recv"

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> ls
hello_send_recv    hello_send_recv.o  nodes_used.c  nodes_used.o
hello_send_recv.c  Makefile           nodes_used.h
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

To run the code in parallel we use the scripts mpihello_send_recv.slurm and mpihello_send_recv2.slurm.

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> sbatch mpihello_send_recv.slurm 
Submitted batch job 5459501
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

After the job finished we see that a file "nodes_used.log" was created besides "slurm.out" and "slurm.err".

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> ls
hello_send_recv    Makefile                   nodes_used.c    nodes_used.o
hello_send_recv.c  mpihello_send_recv2.slurm  nodes_used.h    slurm.err
hello_send_recv.o  mpihello_send_recv.slurm   nodes_used.log  slurm.out
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

The files "slurm.out" and "slurm.err" should be emtpy. But it's likely that you encounter an error message in "slurm.err".

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> cat slurm.err
slurmd[its-cs199]: Unable to get current working directory: No such file or directory
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> cat slurm.out
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

This kind of error messages can be safely ignored according to ITS.

More interesting is the content of the file "nodes_used.log".

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> cat nodes_used.log 
Process 0000 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0001 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0002 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0003 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0004 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0005 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0006 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0007 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0008 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0009 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0010 out of 0012 running on processor its-cs199.its.uni-kassel.de
Process 0011 out of 0012 running on processor its-cs199.its.uni-kassel.de
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

We see that the job used 12 processes on a single node. This is in accordance with the batch script mpihello_send_recv.slurm, in which nodes=1 and tasks-per-node=12 was specified.

Finally, we want to run the code with the batch script mpihello_send_recv2.slurm, which requires nodes=4 and tasks-per-node=12. Thus, we expect that a job, started with this script, was run on 4 nodes with 12 processes each.

kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> sbatch mpihello_send_recv2.slurm 
Submitted batch job 5459502
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv> cat nodes_used.log 
Process 0000 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0001 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0002 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0003 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0004 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0005 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0006 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0007 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0008 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0009 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0010 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0011 out of 0048 running on processor its-cs202.its.uni-kassel.de
Process 0012 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0013 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0014 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0015 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0016 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0017 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0018 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0019 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0020 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0021 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0022 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0023 out of 0048 running on processor its-cs203.its.uni-kassel.de
Process 0024 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0025 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0026 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0027 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0028 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0029 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0030 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0031 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0032 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0033 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0034 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0035 out of 0048 running on processor its-cs204.its.uni-kassel.de
Process 0036 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0037 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0038 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0039 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0040 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0041 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0042 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0043 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0044 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0045 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0046 out of 0048 running on processor its-cs205.its.uni-kassel.de
Process 0047 out of 0048 running on processor its-cs205.its.uni-kassel.de
kopecz@its-cs1:/home/users/0026/kopecz/hello_send_recv>

Additional information

The command squeue displays a list of all running jobs and provides helpful information about their status. Since our program is going to be executed very fast, it is advantageous to call squeue immediately after the job submission.

kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> sbatch mpihello.slurm; squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
    425  minijobs Hello_pa   kopecz   R       0:00      2 its-cs[100-101]
kopecz@its-cs1:/home/fb17/kopecz/hello_parallel>

Each job can be identified by its unique JOBID, in our case the JOBID is 425. We see that this job's NAME is "Hello_pa" as assigned in "mpihello.slurm". The status (ST) of the job is running (R) and the two nodes "its-cs100" and "its-cs101" are involved. TIME shows the elapsed wall clock time.
There are a lot of possible command line options that can be specified, so have a look at the manpage of squeue. Below is an example showing some other useful information available.

kopecz@its-cs1:/home/fb17/kopecz/poisson/study4> squeue -o "%.7i %.9P %.8j %.8u %.2t %.9M %.6D %.8c %.4C %.7h %R"
  JOBID PARTITION     NAME     USER ST      TIME  NODES MIN_CPUS CPUS  SHARED NODELIST(REASON)
   8509    public  poisson   kopecz PD      0:00      8        8   64      no (Resources)
   8587    public  poisson   kopecz PD      0:00      8        8   64      no (Priority)
   8570    public  poisson   kopecz PD      0:00      8        4   32      no (Priority)
   8507    public  poisson   kopecz PD      0:00      8        2   16      no (Priority)
   8506    public  poisson   kopecz PD      0:00      8        1    8      no (Priority)
   8434    public      mpi     zier  R  15:26:47      1        1   12      no its-cs224
   8435    public     mpi2     zier  R  15:26:47      1        1   12      no its-cs225
   8433    public   static     zier  R  18:50:17      1        1   12      no its-cs229

First of all SHARED shows if the --exclusive flag has been set which disables the sharing of nodes. MIN_CPUS shows the minimum number of processes per nodes that have been requested by the job. CPUS tells the total number of processes requested by the job as long as the job is in the queue. If the job is already running CUPS shows the total number of processes allocated to the job.

In order to delete a job, you can call scancel followed by the corresponding JOBID. For example "scancel 425" would kill this example job.

The command sinfo shows information about nodes and partitions.
```
kopecz@its-cs1:/home/fb17/kopecz/hello_parallel> sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
thphysik     up   infinite      3  down* its-cs[116,182,184]
thphysik     up   infinite      8   idle its-cs[117-120,158-159,179-180]
public*      up 10-00:00:0      3  down* its-cs[10,102,208]
public*      up 10-00:00:0     10   idle its-cs[103-105,145-146,148,219-222]
minijobs     up       5:00      2   idle its-cs[100-101]
```
This output provides us with a lot of information about the cluster. First of all we see that there are three different PARTITIONs named "minijobs", "public" and "thphysik". The default partition is "public" which is indicated with the "*". The partition "minijobs" consists of two NODES and the TIMELIMIT for any user job is 5 minutes. The intention here is that "minijobs" shall be used for code development and debugging. In "public" there are a total of 13 nodes. Having a look at their STATE, we see that three of them are "down". This means they are not available for work. The remaining 10 "idle" nodes can be used for computations. In this partition the overall time limit is 10 days, but the default is set to 5 minutes. So use the --time option to ensure a longer runtime. Before using the public partition you should have a look at the --constraint option of sbatch explained in the table above. The last partition "thphysik" is only available for users with special rights. Additionally NODELIST gives the names of the nodes within each partition.

With sview a graphical user interface is available, which provides all the information given by the above mentioned commands.

Linux Cluster Tutorial

Table of Contents

Connecting to the cluster

Tutorial 1: Compile and run tutorial in serial

Compilation of the code

Running the code

Tutorial 2: Compile and run in parallel

Compilation of the code

The batch script

Running the code

Tutorial 3: A first project

Additional information