This is an old revision of the document!
The Hercules Cluster in Garching
The hercules cluster consists of 184 compute nodes and a 250TB GPFS file system. To access the cluster, please apply for an account at http://www.rzg.mpg.de/userspace/forms/onlineregistrationform. Your will receive an email with instructions once your request is approved by our computer department.
From the MPIfR network, you can log on using ssh <user-name>@hercules01.bc.rzg.mpg.de
. hercules02/hercules03 serve as standby to hercules01. Note that your user name on hercules may differ from that in MPIfR.
Using Kerberos authentication
From your MPIfR machine, get a valid kerberos ticket, by invoking a new Kerberos environment:
ramesh@pc20181 ~ $ kpagsh
ramesh@pc20181 ~ $ kinit rameshk@IPP-GARCHING.MPG.DE
rameshk@IPP-GARCHING.MPG.DE's Password:
ramesh@pc20181 ~ $
Once the above is done (with rameshk replaced by your user name), from the same terminal, you can log on to hercules without a password.
ramesh@pc20181 ~ $ ssh rameshk@hercules01.bc.rzg.mpg.de
Data transfer to Hercules
There are multiple ways to transfer data to hercules, which do or do not use the fast link between Bonn and Garching. If you don't need a fast transfer then you can simply copy your data "the normal way":
rsync -Pav <data> <user-name>@hercules01.bc.rzg.mpg.de:<destination-dir-hercules>
NOTE: this method will prompt you for your MPCDF password!
If transfer speed is of the essence there are two approaches to copy data to hercules. The easiest way to copy your data is to directly copy it to the hercules gateway in Bonn (hgw or herculesgw), and use the directory /media/hercules
. Your own directory is listed under /media/hercules/u/<user-name>
. From there your data will automatically sync to your home directory on hercules. Please note that only those having an hercules account are allowed to log in to herculesgw and that data can only be copied to your own home directory. This method is therefore only suited for smaller data sets due to the limited storage space in your home directory on hercules.
.
.
The second method transfers data directly to hercules via the hercules gateway. In order to use the 10GbE line to Garching, transfer data from any of the local machines hooked to the 10GbE network (eg. miraculix/miraculix2/verleihnix/archivesrv). On any of these machines create an ssh tunnel to hercules using:
ssh -f -N -L <local-port>:hercules01.bc.rzg.mpg.de:22 <user-name>@hgw
where the <local-port> should be a free, not in use port on the machine you run this command. To check if the port is available:
lsof | grep <local-port>
NOTE: if you get a "bind: Cannot assign requested address" ERROR, force ssh to use ipv4 with an additional "-4" option.
Now you can transfer data through this port using:
rsync -Pav -e "ssh -p <local-port>" <data> <user-name>@localhost:<destination-dir-hercules>
To simplify this copy process add the following to your ~/.ssh/config
file:
Host htun Hostname localhost HostKeyAlias htun User <user-name> Port <local-port>
This addition allows you to copy data to hercules with a much simpler command like, similar to a standard data transfer:
rsync -Pav <data> <user-name>@htun:<destination-dir-hercules>
Once all data has been copied to hercules it is advisable to close the ssh tunnel again. Therefore, log into the machine you opened the tunnel at and identify the PID (second column) of the open tunnel using:
ps -aef | grep -i ssh
Then close/kill the tunnel with:
kill -9 <PID>
NOTE: for password-less data transfer you need to add your public ssh-key from the hercules gateway to hercules. Log into the gateway and type "ssh-keygen -t rsa", choose a filename in which the private and public key should be stored and just hit <Enter> when you are asked for a passphrase. Then add the just created public key to hercules using "ssh-copy-id <user-name>@hercules01.bc.rzg.mpg.de". If successful, you should now be able to log into hercules from the hercules gateway without having to give a password. You can then also password-less transfer data via the hercules gateway with the above given method.
Support
For help on modules/software/cluster, please email Christian (christian.guggenberger@rzg.mpg.de) or Markus (mjr@rzg.mpg.de).
VNC
To use VNC with the Hercules cluster, see:
http://www.mpcdf.mpg.de/services/network/vnc/vnc-at-the-mpcdf
Installing Python librairies
To install standard python librairies, see:
http://www.mpcdf.mpg.de/about-mpcdf/publications/bits-n-bytes?BB-View=192&BB-Document=150
Sample script
Here's an example snippet which can be submitted with qsub
:
### shell
#$ -S /bin/bash
### join stdout and stderr
#$ -j y
### change to current work dir
#$ -cwd
### do not send email reports
#$ -m n
### request parallel env with 8 cpus
#$ -pe openmp 8
### wallclock 2 hours
#$ -l h_rt=7200
### virtual limit per job 20GB
#$ -l h_vmem=20G
date
The CPU count specified with #$ -pe openmp XYZ
can be varied from 1-24.
#$ -pe openmp
can be ommitted, but then, one cpu is assumed.
h_rt
is mandatory and can be as much as 12 days (288:00:00).
h_vmem
is optional; if not present, 7G is set as default.
Currently, there is no CPU-binding enforced; in other words, if users use more cpus (e.g. create more threads) than requested, they'll steal CPU time from other jobs.