[Colsa-premise-users] New Monitoring Capability on Premise

Westbrook, Anthony S Anthony.Westbrook at unh.edu
Tue Apr 11 13:36:58 EDT 2017


Hi Everyone -

In the past, a few people have noted the inability to view specific processes and resources being used on individual Premise compute nodes during execution, akin to what the "top" Linux utility provides.  Slurm utilities like "sstat" can give average and max values for an entire job, but not real-time, process level details. Likewise, "top" can only monitor the head node, and not compute nodes.

To address this need, we've developed a program now available on Premise named "slurm-monitor", available in the linuxbrew/colsa module.  Usage is as follows:

"slurm-monitor <Job ID>"

Where "<Job ID>" is the ID of the Slurm job (reported during job submission, or at any time by running the "squeue" command)

Slurm-Monitor provides a wrapper for "top" - all information "top" normally provides is shown for a node running the specified job, with additional Slurm job details at the bottom of the screen.  If the job is running across multiple nodes, the currently monitored node can be cycled using the "[" and "]" keys.  A screenshot of slurm-monitor is attached.

We will also be incorporating additional usage information (GPU, disk, memory, etc) into this program in the future.

Thanks -

Toni Westbrook
Computational Scientist
Research Computing Center, College of Life Sciences and Agriculture
University of New Hampshire
Office: 436 Gregg Hall





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sr.unh.edu/pipermail/colsa-premise-users/attachments/20170411/17bfbc4a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm-monitor.png
Type: image/png
Size: 79613 bytes
Desc: slurm-monitor.png
URL: <http://lists.sr.unh.edu/pipermail/colsa-premise-users/attachments/20170411/17bfbc4a/slurm-monitor-0001.png>


More information about the Colsa-premise-users mailing list