Using The Discover Cluster

The Discover Cluster Environment

The Discover cluster is the main compute cluster for processing batch jobs requiring significant compute resources. It consists of several scalable compute units(SCUs) that offer a variety of processor types. There are a variety of nodes dedicated to batch computing and interactive data analysis.

Operating System

SuSE Linux Enterprise Server

Use of Discover requires basic Linux usage skills. Please be sure you achieve competence in the concepts and commands explained in these resources, before using Discover:

Introduction to Linux Tutorial
More Advanced Linux Tutorial
SATERN Linux shell-scripting course code: os_doss_a01_it_enus
Linux Handbook on file-permissions and ownership.

For fine-grained control of file and directory permissions, you might also need to understand and consult with the Discover systems team regarding your requirements for Access Control Lists.

Processor Architectures

Architecture	CPUs/GPUs	Memory per CPU/GPU	Memory per Node
Cascade Lake	48 CPUs (46 usable)	4.0 GB / CPU	190 GB
AMD Rome	48 CPUs + 4 GPUs	122 GB / GPU	498 GB
Milan	128 CPUs (126 usable)	4.0 GB / CPU	512GB

Learn how to use Cascade Lake and Milan nodes to submit a Slurm job

Shells

BASH is the default shell available to all users on Discover. To switch to a different default shell, contact NCCS Support.

Here is a list of available shells on the Discover cluster. To verify your local environment check the $SHELL environment variable.

bash
csh
tcsh
sh or ksh

Log-in Information

Here you can find guidance about logging-in to NCCS Systems.

learn more

File System + Storage

The NCCS provides several different types of file systems, including Home, Nobackup, Scratch, and Archive on the Discover cluster.

learn more

Compilation + Software

To accommodate the needs of a broad range of user groups, multiple versions of compilers from different vendors are provided on the Discover cluster.

learn more

File Transfers

Describes how to perform a secure file transfer between the Discover cluster and other systems.

learn more

JupyterHub

JupyterHub is a web-based portal that allows users to use Python, Octave, or interactive shell access to Discover for visualization, data processing and analysis, as well as general interaction with the cluster.

learn more

Running Jobs using Slurm

Slurm is a job scheduler and resource manager dedicated to organizing scientific computing jobs on the Discover supercomputer. This section gives instruction on how users submit, monitor, kill jobs and much more!

learn more

Learn how to fully utilize Slurm’s scheduling algorithm to enhance and schedule your job as soon as possible by flagging time limits, and node requirements.

Slurm best practices

discover qos

NCCS Task Farming

NCCS Task Farming (NCCS-TF) is a Python application that allows users to execute independent tasks concurrently across nodes on multicore clusters. The package consists of a set of Python scripts working together through two simple text-based interfaces. NCCS-TF does not require any knowledge of the individual tasks (serial and even parallel) and does not make any assumptions about the underlying applications. As a matter of fact, the tasks to be executed can be from different applications. It can be seen as a task parallelism tool where multiple concurrent independent tasks are executed in parallel.

learn more

Monitoring + Optimization

Use the following techniques and tools to analyze and utilize your programming workflows

Memory Tools
Performance Tools
Storage Tools
Debugging Tools

learn more

GPU Partition

Documentation on accessing and using Discover GPUs.

learn more

Miscellaneous

Following page includes variety of topics that will facilitate the users to change their Discover cluster environment according to their requirements:

Using Modules to load appropriate compilers, libraries and other software.
Cron on the Discover cluster allows users to automatically run tasks at a specified time.

learn more