Knowing your job efficiency

XDMoD.mines.edu

  • Management capabilities

    • Monitoring standard metrics: utilization

    • Metrics designed to identify underperforming systems hardware and software

    • Reporting job level performance data for every job running on the HPC

  • A tool to effectively and efficiently use an allocations and optimize HPC resources

  • Ability to monitor, diagnose, and tune system performance and measure the performance of all applications running

  • Easily obtain detailed analysis of application performance to aid in optimizing code performance

  • A diagnostic tool to facilitate HPC planning and analysis

  • Metrics to help measure scientific impact.

  • Analyses of the operational characteristics of the HPC environment can be carried out at different levels of granularity

    • job, user, or on a system-wide basis.

Using the Efficiency Tab

XDMoD job efficiency System Wide Report

SLURM command ‘seff’

Using the SLURM build-in command seff $JOBID

Job ID: xxxxxxx
Arrage Job ID: xxxxxx_0
Cluster: wendian
User/Group: username/usergroup
State: COMPLETE (exit code 0)
Nodes: 2
Cores per node: 4
CPU Utilized: 00:05:51
CPU Efficiency: 23.21% of 00:25:12 core-walltime
Job Wall-clock time: 00:03:09
Memory Utilized: 973.22 MB (estimated maximum)
Memory Efficiency: 6.76% of 14.06 GB (1.76 GB/core)

Selecting Job ID in XDMoD.mines.edu

Output detailed information on accounting data, job script, executable, and metrics

Example of Rscript using 3-nodes

XDMoD job efficiency an Rscript

A 3-node job where all nodes have work for the first three hours, then each node runs out of work and because an unbalance workload across the 3-node job. The total CPU Efficiency is reported at 12.8%. The Rscript program running should be examined to improve this load inbalance.

Example of Ansys Fluent (interactively running)

XDMoD job efficiency an Ansys Fluent

This jobs using 5-days of compute wall-time a utilization efficiency of 60.44% is report. The user was probably accessing the interface and adjusting settings during the few drops seen in the graph.