Policies

HPC Etiquette

Login/Management Node

When you login one of our HPC systems, you login what is known as the “management” node. This node allows one to login to HPC systems and interface with the job scheduler SLURM. Additionally, the management node can also be used to edit files, create environment and compile codes. However as a general rule, running simulation software on the management node is prohibited. On both HPC systems, a software called arbiter monitors system resources used on the login node. If you are using too many CPU resources, an automated email will be sent to you Mines E-Mail, warning you and throttling your CPU usage. Once a cooldown period ends, your CPU allotment will return to normal.

Scratch vs Home Directory

Slurm

Walltime Policy

The standard maximum walltime is six days (144 hours):

#SBATCH –time=144:00:00.

This policy is strictly enforced by HPC@Mines. In the event that the computational problem you are tasked with solving seems to require a walltime that exceeds 144 hours, we strongly encourage that you find alternative approaches to simply extending walltime. Below are two possible approaches.

Increase the amount of parallelism

By increasing the number of cores/nodes used in your job, you can often decrease the total wall time needed. If your code is only a single-core workload, feel free to reach out to us for a HPC technical consultation for other workflow options.

Incorporate checkpointing

Checkpointing is the process of periodically saving the state of a code’s program execution so that it can be resumed at a later time. This is extremely helpful in mitigating the effects on your calculation in the event of an unexpected crash or error. By saving output periodically, or at a certain recurring point, and being able to restart the calculation using the saved output, a catastrophic loss of an entire days-long compute effort could be avoided. Using checkpointing to intentionally restart a calculation at a reasonably estimated point is a recommended approach to remain within the six-day maximum walltime.

For more focused computational assistance, with the above situations and other compute aspects of your research, the HPC@Mines team is available and willing to provide personal, one-on-one assistance. Please submit a help request to start the process. We also suggest consulting with members of your group or other peers currently using similar codes or applications; they may provide expedited answers to your questions, based on their experience.