Contents
view
Global view
Launching a computation on the platform means submitting a “job” in the queue among those available. This involves the following procedure:
- Cluster connexion
- Data transfert
- BATCH script creation
- Job submission
Nodes
Waiting queues (Partitions)
Commands for managing your “jobs”
Monocore job sample : monocore.slurm
Requesting a computation core on a node and 5 MB for 10 minutes. Sending an email at each stage of the job’s life.
Create a sbatch file here named monocore.slurm
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00
#SBATCH --mail-type=ALL
#SBATCH --job-name=my_serial_job
#SBATCH --output=job_seq-%j.out
#SBATCH --mail-user=votre.mail@domain.précis
#SBATCH --mem=5M
time sleep 30
hostname
Job submission
The command “sbatch monocore.slurm” will put the job in the default queue because no queue is specified in the file. The job will run as soon as the resources are available.
Sbatch options
#SBATCH --partition=
partition name#SBATCH --job-name=
job name#SBATCH --output=
file in which the standard output will be saved#SBATCH --error=
name of the file to store the errors#SBATCH --input=
file name of the standard input#SBATCH --open-mode="append" to write in existed file, "truncate" to reset files
#SBATCH --mail-user
=your@mail#SBATCH --mail-type=<BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_50,...>
case of sending an e-mail#SBATCH
--sockets-per-node=1 or 2
#SBATCH --threads-per-core
thread number per core, no usable with MatriCS plateform, nodes aren’t multithreaded (ask us if it’s needed.)#SBATCH --cores-per-socket=
Core number per socket#SBATCH --cpus-per-task=CPU number for each task
#SBATCH --ntasks=
task number- #SBATCH –mem-per-cpu=RAM per core
#SBATCH --ntasks-per-node=
task number per node.
Variable d’environnement SBATCH
- SLURM_JOB_ID : job id
SLURM_JOB_NAME
: job nameSLURM_JOB_NODELIST
: Used nodes listSLURM_SUBMIT_HOST
: server from which the job has been launchedSLURM_SUBMIT_DIR
: Répertoire dans lequel le job a été lancéSLURM_JOB_NUM_NODES
: Nombre de nœuds demandésSLURM_NTASKS_PER_NODE
: Nombre de cœurs demandés par nœudsSLURM_JOB_CPUS_PER_NODE
: Nombre de thread par nœud
Exemple d’un job MPI : jobMPI.slurm
Demande de 2 nœuds et 16 cœurs, 8 Mo sur chacun pour 10 minutes.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:10:00
#SBATCH --job-name=my_mpi_job
#SBATCH --output=mpi_job-%j.out
#SBATCH --mem=8M
#SBATCH --mail-type=ALL
#SBATCH --mail-user=laurent.renault@u-picardie.fr
ml gnu12 openmpi4
mpiexec time sleep 30
Example d’un job openMP : job_openMP.slurm
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=04:00:00
#SBATCH --job-name=my_openmp_job
#SBATCH --mem=96M
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./my_program
GPU usage
To use GPUs, specify the following parameter –gres=gpu:X with X the number of GPUs.
Here sbatch script “mon_script.sh” to ask 2 GPUs and 28 cores (bigpu).
#!/bin/sh
#SBATCH --job-name=tensor
#SBATCH --partition=bigpu
#SBATCH --gres=gpu:2
#SBATCH --time=0:10:00
#SBATCH --mail-type=ALL
#SBATCH --output=job-%j.out
#SBATCH --mem=60G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=28
hostname
python hello.py
- To submit the job, use the following command :
sbatch mon_script.sh
- Interactif sample :
srun --ntasks=1 --mem=4G --gres=gpu:1 --time=1:00:00 --partition=bigpu --pty /bin/bash
- nvidia-smi command show you GPUs usage.