High Performance Computing For Science

Tag: Slurm

Using Prism

Prism is equipped with several powerful servers built specifically for accelerating AI/ML/DL workloads with GPUs. This cutting edge platform is easy to access and the preinstalled software/libraries provide foundational tools that enable scientists to maximize their workflows. Environment System Sockets CPU Cores per socket Total CPU Cores CPU Memory NVME Storage (TB) GPUs and GPU…

February 11, 2026
Slurm on ADAPT

In order to fairly distribute user jobs across shared resources, some of our VM clusters on ADAPT are equipped with Slurm. With Slurm, users can run both interactive and non-interactive jobs on specified resources without having to worry about interference from other user workloads. When resources aren’t readily available, Slurm also will queue your jobs…

February 11, 2026
Miscellaneous Topics

Modules Learn how to use the module command to set up your Discover cluster environment with available compilers, interpreters and other software packages. Cron on Discover Automate running your tasks at specific time intervals on the Discover cluster using cron.

February 10, 2026
Monitoring Jobs on Discover using slurm

Query jobs using squeue To see the status of your job, “squeue” queries the current job queue and lists its contents. Useful options include: -a which lists all jobs -t R which lists all running jobs -t PD which lists all pending (non-running) jobs -p datamove which lists all jobs in the datamove partition -j…

January 27, 2026
Discover CSS Access through Slurm

CSS read-only access on Discover is provided to a subset of Discover’s Slurm-managed compute nodes. These are limited to Scalable Unit 16, which includes two different node types: 676 CPU-only nodes with Intel “Cascade Lake” CPU architecture, and twelve nodes with AMD “Rome” CPUs combined with NVIDIA A100 GPUs and Scalable Units 17 and 18,…

January 27, 2026
System Status

Discover Job Status Due to changes in Discover’s reporting processes, system hardware, and resource allocation, the information on the jobmon page is no longer accurate so we have removed it while we investigate a more scalable and flexible solution. In the interim, you may use the following command to get a rough idea of when…

January 26, 2026
Multiple Jobs per Node

Background The number of CPUs (aka “cores”) per node among Discover’s processor architectures has continued to increase over time, with current Milan processors having 128 cores. Skylake and Cascade Lake nodes offer 40 and 46 cores per node respectively. Many NCCS users have legitimate use cases for significantly lower core counts per job (particularly for…

January 26, 2026
File System on Discover Cluster

The Discover cluster provides several different types of file systems: home, nobackup, and temporary/scratch. See the showquota documentation for information on how to monitor your storage usage. File System Type Variable on Discover cluster Default Quota Backup Cycles Home Directory IBM GPFS $HOME 1GB Daily Scratch IBM GPFS $NOBACKUP 5Tb/300k inodes No Backups Scratch local $LOCAL_TMPDIR node…

January 22, 2026
Discover GPU Partition

GPU Availability Within The Discover Cluster Scalable Unit 16 (SCU16) makes GPU resources available within the NCCS Discover cluster’s gpu_a100 partition, which comprises 10 AMD nodes that each include: Note: These nodes will be fully shared, with individual nodes running jobs belonging to multiple users. Each user is limited to a maximum of one node…

January 22, 2026
Discover Quality of Service Details

Slurm’s Quality of Service (QoS) feature controls resource limits for every job in the Discover job queue. Available QoSs in the table below apply only to jobs submitted to the Slurm default partition. (It is important for maximum adaptability of your job scripts that you not specify any partition if you wish to use the…

January 15, 2026