CHEAT SHEET ON RUNNING ON SUPERCOMPUTERS


== BEHEMOTH (SDSU) ==

To compile, make sure

/home/cjohnson/mpich-install/bin

is in your path.  To run, do

mpiexec -n 32 (EXECUTABLE) << input

Here the flag -n determines the number of MPI processes recruited.


== NERSC (LBL) ==


-- Edison --

Basic information on Edison

Edison is a Cray XC30 supercomputer with 5576 compute nodes; each node has 2 12-core chips, so a total of 24 cores/compute node

Each node has 64 GB, so 2.67 Gb/Core


You should always run on the project directory, never on your home directory!

cd /project/projectdirs/m2594/

(Set up a directory with your username so as to not overwrite other people's work)

Help pages on setting up batch scripts

Sample input script:

#!/bin/bash -l
#SBATCH -p debug
#SBATCH -A m2594
#SBATCH -N 128
#SBATCH -t 0:29:00
#SBATCH -J ne30test
#SBATCH --mail-type=END
#SBATCH --mail-user=cjohnson@mail.sdsu.edu

export OMP_NUM_THREADS=12
srun -n 256 -c 12 ./bigstick-mpi-omp.x < ne30.input

other option

#SBATCH -p regular      ! regular batch

#SBATCH --qos=premium  ! will cost 2x as much but will run sooner


The above will submit a job to Edison's debug queue under account m2594 on 128 compute nodes for 256 MPI processes with 12 openmp threads/process

The file ne30.input is previously created; all files must be in the folder from which you submit.

To submit batch scripts:

sbatch myscript

When you submit it, you get a jobnumber, such as  2969008

To check on queue

squeue -u username

To delete a queued job

scancel jobnumber

Here is information on wait times (must log on using your NIM password)

-- Cori --


== ALCF (Argonne) ==

Compiling and linking on ALCF machines

You should always run on the project directory, never on your home directory! cd /projects/AstroSym/

ALCF computers use the cobalt job submission system:

General reference on cobalt

To submit jobs

qsub -t 1:30:00

To check status

qstat -u [username]

To delete a job

qdel [JobID]

To check allocation used

cbank

cbank -u [username]

cbank -p [projectname]


--  Mira --

Key characteristics:

  • Nodes: 49,152 with 16 cores/node = 786,432 cores
  • Memory/node: 16 GB RAM per node = 1 GB/core
  • To login:

    ssh [username@]mira.alcf.anl.gov

    sample submission

     qsub -t 0:59:00 -n 512 --mode c2 --env OMP_NUM_THREADS=8 -i ni56.input bigstick-mpi-openmp.x

    -t = time

    -n = # of nodes

    --mode c = # of MPI processes/node

    --env OMP_NUM_THREADS set an environment variable; note in general #MPI jobs/node x # threads = 16

    -i  input file name (as in from keyboard)

    to check on jobs

    qstat -u [username]

    to delete a job from the queue

    qdel [jobID; you can find this using qstat]

    Information on submitting jobs

    Graphical representation of queued and running jobs on Mira (very useful)

    -- Cetus --

    Submission same as for Mira.

    Graphical representation of queued and running jobs on Cetus (very useful)