The Computing Service’s Linux farm is accessible by all the users that who have an AFS account of LNF.
The devices respond only by SSH protocol from internal laboratories nodes.
To connect to the farm nodes the alias lxcalc.lnf.infn.it is avaiable.
Hardware
5 x Slot 1U HP Proliant DL360 - CPU: 2 x Intel Xeon 2.8 GHz / 512 Kb L2 Cache - RAM: 3 GB - HDD: 1 x 18 GB @ 15.0000 rpm SCSI U320 - LAN: 1 x GigabitEthernet 10/100/1000
Software
Scientific Linux 3.04 ASIS ATLAS BLAS high perf lib by K.Goto Blacs Boost CERNLIB CMT Clhep F90gl F95 Fluka Ftncheck Garfield Geant4 Glut Glut-3.7.1 Gnuplot4 Grace Intel C/C++ Compiler Intel Debugger Intel Eclipse Intel Fortran Compiler Intel JRockit JVM Intel MKL (Math Kernel Library) Kdiff3 Mathematica Mercury Mesa Mpich Nail OpenDx Pgplot Plusfort Ppower4 Root Scalapack - Openpbs 2.3.13 4 code: small max.cput = 1 H medium max.cput = 8 H long max.cput = 24 H verylong max.cput = 72 H max_user_run = 2 max_group_run = 6 resources_max.file = 2047 MB resources_max.vmem = 2047 MB - Mathematica 5 Per la versione grafica si deve aggiungere fontserver.lnf.infn.it nei fontserver del server/emulatore X11 ed eseguire `mathematica` - Geant4.6.2 Prima di utilizzare le librerie Geant4 del Cern, eseguire `geant4.env.setup`, che impostera' le variabili d'environment. - Aree di storage temporaneo disponibili: /scratch/nfs/<nomegruppo>/<nomeutente> ($scratchnfs) /scratch/local/<nomegruppo>/<nomeutente> ($scratchlocal) /tmp (quota 30 MB)
– qdel
Allows you to delete a job from a queue previously submitted:
[dmaselli@lxcalc4:~]> qdel <JobID>
– qstat
Allows you to display information about the submitted job status:
[dmaselli@lxcalc4:~]> qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 431.lxcalc1 test dmaselli 0 E long 432.lxcalc1 test dmaselli 0 R verylong 433.lxcalc1 test dmaselli 0 R verylong 434.lxcalc1 test dmaselli 0 R small 435.lxcalc1 test dmaselli 0 R verylong 436.lxcalc1 test dmaselli 0 R verylong 437.lxcalc1 test dmaselli 0 Q verylong 438.lxcalc1 test dmaselli 0 Q verylong 439.lxcalc1 test dmaselli 0 Q verylong
S field indicates the job status, it can be
E – Executed: The job it’s done, it will be removed from the queue soon.
R – Running: The job is running
Q – Queued: The job is queued, it’s waiting for the scheduler that set it to run.
To have detailed information of the queued job:
[dmaselli@lxcalc4:~]> qstat -f Job Id: 440.lxcalc1.lnf.infn.it Job_Name = test Job_Owner = dmaselli@lxcalc4.lnf.infn.it job_state = R queue = verylong server = lxcalc1.lnf.infn.it Checkpoint = u ctime = Mon Sep 13 13:21:28 2004 Error_Path = lxcalc4.lnf.infn.it:/scratch/nfs/calcolo/dmaselli/test.err exec_host = lxcalc3/0 Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = abe mtime = Mon Sep 13 13:21:29 2004 Output_Path = lxcalc4.lnf.infn.it:/scratch/nfs/calcolo/dmaselli/test.log Priority = 0 qtime = Mon Sep 13 13:21:28 2004 Rerunable = False Resource_List.cput = 72:00:00 Resource_List.file = 2047mb Resource_List.nodect = 1 Resource_List.nodes = Linux Resource_List.vmem = 512mb session_id = 14031 Variable_List = PBS_O_HOME=/afs/lnf/user/d/dmaselli, PBS_O_LANG=en_US.iso885915,PBS_O_LOGNAME=dmaselli, PBS_O_PATH=.:/afs/lnf/user/d/dmaselli/bin:/usr/lnf/bin:/usr/afsws/bin: /usr/afsws/etc:/usr/kerberos/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/cus tom/openssh/bin:/usr/X11R6/bin:/usr/local/bin:/usr/sbin:/usr/local/bin/ X11:/cern/pro/bin:/usr/sbin:/usr/lnf/root/bin, PBS_O_MAIL=/var/mail/dmaselli,PBS_O_SHELL=/bin/tcsh, PBS_O_HOST=lxcalc4.lnf.infn.it, PBS_O_WORKDIR=/scratch/nfs/calcolo/dmaselli,PBS_O_QUEUE=default comment = Job started on Mon Sep 13 at 13:21 etime = Mon Sep 13 13:21:28 2004 Job Id: 441.lxcalc1.lnf.infn.it Job_Name = test Job_Owner = dmaselli@lxcalc4.lnf.infn.it job_state = R queue = verylong server = lxcalc1.lnf.infn.it Checkpoint = u ctime = Mon Sep 13 13:21:30 2004 Error_Path = lxcalc4.lnf.infn.it:/scratch/nfs/calcolo/dmaselli/test.err exec_host = lxcalc2/0 Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = abe mtime = Mon Sep 13 13:21:30 2004 Output_Path = lxcalc4.lnf.infn.it:/scratch/nfs/calcolo/dmaselli/test.log Priority = 0 qtime = Mon Sep 13 13:21:30 2004 Rerunable = False Resource_List.cput = 72:00:00 Resource_List.file = 2047mb Resource_List.nodect = 1 Resource_List.nodes = Linux Resource_List.vmem = 512mb session_id = 14049 Variable_List = PBS_O_HOME=/afs/lnf/user/d/dmaselli, PBS_O_LANG=en_US.iso885915,PBS_O_LOGNAME=dmaselli, PBS_O_PATH=.:/afs/lnf/user/d/dmaselli/bin:/usr/lnf/bin:/usr/afsws/bin: /usr/afsws/etc:/usr/kerberos/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/cus tom/openssh/bin:/usr/X11R6/bin:/usr/local/bin:/usr/sbin:/usr/local/bin/ X11:/cern/pro/bin:/usr/sbin:/usr/lnf/root/bin, PBS_O_MAIL=/var/mail/dmaselli,PBS_O_SHELL=/bin/tcsh, PBS_O_HOST=lxcalc4.lnf.infn.it, PBS_O_WORKDIR=/scratch/nfs/calcolo/dmaselli,PBS_O_QUEUE=default comment = Job started on Mon Sep 13 at 13:21 etime = Mon Sep 13 13:21:30 2004
With the use of specific options, it permits to view queues or server informations (-q e -Q) or (-B). Examples:
[dmaselli@lxcalc4:~]> qstat -Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type ---------------- --- --- --- --- --- --- --- --- --- --- ---------- verylong 6 0 yes yes 0 0 0 0 0 0 Execution long 8 0 yes yes 0 0 0 0 0 0 Execution medium 10 0 yes yes 0 0 0 0 0 0 Execution small 14 0 yes yes 0 0 0 0 0 0 Execution default 0 0 yes yes 0 0 0 0 0 0 Route [dmaselli@lxcalc4:~]> qstat -B Server Max Tot Que Run Hld Wat Trn Ext Status ---------------- --- --- --- --- --- --- --- --- ---------- lxcalc1.lnf.infn 0 0 0 0 0 0 0 0 Active
– pbsnodes
It permits to view the cluster nodes status information, especially useful for system administration. Example:
[dmaselli@lxcalc4:~]> pbsnodes -a lxcalc1 state = free np = 2 properties = Linux,lxcalc1 ntype = cluster lxcalc2 state = free np = 2 properties = Linux,lxcalc2 ntype = cluster lxcalc3 state = free np = 2 properties = Linux,lxcalc3 ntype = cluster lxcalc4 state = free np = 2 properties = Linux,lxcalc4 ntype = cluster lxcalc5 state = free np = 2 properties = Linux,lxcalc5 ntype = cluster axcalc state = free np = 6 properties = AIX,axcalc ntype = cluster
NOTE: The following files: STDOUT and STDERR, executable and processing data mustn’t reside on AFSI area For this you have to put a copy of executable and processing data on NFS scratch areas. ( /scratch/nfs/<groupname>/<username> )
Generally it’s possible submits a job by using also command line:
qsub -
<commands>
press CTRL-D for quitting command insert.
Howewer this last procedure is strongly not recommended.
Usually the job submission at PBS take place via:
qsub <file.pbs> [-l (Linux|AIX)]
where <file.pbs> it’s a job-script which contains a list of qsub directivers followed by the real job instructions.
Thw option -l <architecture>’ is optional and it’s necessary only when you want to send a job to the farm axcalc linux and vice versa.
By default, the job will be run on a machine with the same architecture as that one from it was submitted.
– How to create a script job for PBS –
The first row of the script, before any #PBS directive, must be #!/shell, where shell is the complete path for a choosen shell.
For Example:
#!/bin/sh
The guidelines can be given until the start of the first command. Here’s a quick overview of the most important parameters that you can set (for the complete list, execute man qsub):
For assign a name to the job:
#PBS -N <job name>
By default the job will have the same name as the pbs file.
For assign a name to the error file:
#PBS -e <path>
The path is not absolute, but relative at the working dir.
If this parameter is not set, the assigned name will be the standard one <job name>. Followed by the job identification number.
Symilarly for assign a name to the output file.
#PBS -o <path>
If this parameter is not set, the assigned name will be the standard one <job name>. Followed by the job identification number.
For execute a job in interactive mode:
#PBS -I
If this option is selected, the job will be executed in intteractive mode, or the standard input, the standard output and the errors flows will be linked throught qsub to the console from where the job was submitted.
To let keep the input or output file to the executor node:
#PBS -k <argument>
The k is for “keep”. Establishes if the standard output or standard error should be retained by the executor node of the job. The possible arguments are e (only standard error), o (only standard output), eo (both), oe (both), n (no one, it’s default). File are saved with the standard name.
For indicate te essential resources for the job:
#PBS -l <resource_list>
For example for indicate the CPU’s time necessary for the job, the script will be like:
#PBS -l cput=01:00:00
It’s possible to set other values inside the same command, separated by comma. If a resource is indicated with no values, It’s set to infinite. Example:
#PBS -l cput
To specify whether to send an e-mail notification:
#PBS -m <mail_options>
The possible options are:
a – E-mail is sended if a job aborted.
b – E-mail is sended when a job is execute.
e – E-mail is sended when a job end the execution.
n – No one E-mail is sended
The default option is “a”.
To indicate where to send the email notification:
#PBS -M <user_list>
If you are given more email addresses, they must be separated by a comma. By default, the e-mail is sent to the job owner, or the one who has submitted it.
Per indicare la shell da utilizzare:
#PBS -S <path_list>
It indicates to PBS where to find the shell. Usually it can be find in the directory /bin/sh.
If it not specified, PBS use the same shell used by the executor node user.
For indicate the name of the job owner:
#PBS -u <user_list>
Also in this case the dafault name is the same of the submitter of the job.
(Es.: /scratch/nfs/calcolo/dmaselli/test.exe)
Example:
Input file (short.pbs)
#!/bin/sh
#Ho appena definito la shell (e' una bash)
#PBS -S /bin/sh
#PBS -M dmaselli@lnf.infn.it
#PBS -m e
#PBS -l cput=01:01:00
#PBS -o risultato
#PBS -e errori
#commento
#Questo e' un commento, il prossimo invece e' un comando
echo ""
DATE=`date`
#Attenzione ad usare gli apici giusti e a non mettere spazi!!!
echo "$DATE"
sleep 5
echo "Ci sono una serie di cose interessanti che ti interessera' sapere"
echo "Questo job stato identificato come $PBS_JOBID e si chiama $PBS_JOBNAME"
echo "e' stato inserito inizialmente nella coda $PBS_O_QUEUE"
echo "ed e' stato eseguito sulla coda $PBS_QUEUE"
echo "E' stato sottoposto dalla macchina: $PBS_O_HOST"
echo "E' stato eseguito sulla macchina: `hostname`"
date
echo ""
#PBS -o risultato Questa direttiva viene ignorata
Output file 1
Mon Mar 24 16:46:40 CET 2003 Ci sono una serie di cose interessanti che ti interessera' sapere Questo job stato identificato come 56.lxcalc3.lnf.infn.it e si chiama short.pbs e' stato inserito inizialmente nella coda default ed e' stato eseguito sulla coda long E' stato sottoposto dalla macchina: lxcalc5 E' stato eseguito sulla macchina: lxcalc3 Mon Mar 24 16:46:45 CET 2003
Output file 2
Mon Mar 24 16:48:07 CET 2003 Ci sono una serie di cose interessanti che ti interessera' sapere Questo job stato identificato come 57.lxcalc3.lnf.infn.it e si chiama short.pbs e' stato inserito inizialmente nella coda default ed e' stato eseguito sulla coda long E' stato sottoposto dalla macchina: lxcalc4 E' stato eseguito sulla macchina: lxcalc2 Mon Mar 24 16:48:12 CET 2003
Parallel MPI Jobs in PBS
MPI is supported on our linux farm via MPICH. You can find compilers, libraries and include files in /usr/lnf/farmsw/mpich/
When you are submitting MPI jobs in PBS you have to use mpiexec into the PBS script, NOT mpirun (or mpich.mpirun).
To specify number of nodes and cpu-per-node you have to add them in qsub command line. For example to run a job on 3 nodes with 2 cpu-per-node:
qsub -l nodes=3:Linux:ppn=2 <script-pbs>
You must not put these directives into the PBS script.
Sample PBS Script:
#!/bin/sh ### Job name #PBS -N testing ### Declare job non-rerunable #PBS -r n ### Output files #PBS -e /scratch/nfs/calcolo/dmaselli/MPI/test.err #PBS -o /scratch/nfs/calcolo/dmaselli/MPI/test.log ### Mail to user #PBS -m ae # ### This job's working directory echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR # echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` echo This jobs runs on the following processors: echo `cat $PBS_NODEFILE` ### Define number of processors NPROCS=`wc -l < $PBS_NODEFILE` echo This job has allocated $NPROCS nodes # ### Run the parallel MPI executable mpiexec /scratch/nfs/calcolo/dmaselli/MPI/mpi.exe