[Trillian-users] trillian queue and runs

Maciolek, Mark Mark.Maciolek at unh.edu
Thu Mar 15 08:52:20 EDT 2018


Kate,

The alps process has once again failed to work properly. I have restarted the alps process, I would recommend any user with their jobs in ‘H’ status to remove their jobs and resubmit them.

44538.sdb         yout.bat         samark                   0 H workq
44545.sdb         perp_D_2.pbs     kgklein                  0 H workq
44546.sdb         PBS_job_script_  salme                    0 H workq
44547.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq
44548.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq
44551.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq
44553.sdb         st.pbs           pai                      0 H workq
44556.sdb         ql.pbs           pai                      0 H workq
44558.sdb         PBS_job_script_  salme                    0 H workq
44559.sdb         L20cm            fs1036                   0 H workq
44560.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq
44561.sdb         shockTest        mgorby                   0 H workq

If that does not succeed, I will need to reboot trillian.

Mark
--Mark Maciolek
Network Administrator
Morse Hall Rm 338
http://www.unh.edu/research/support-units/research-computing-center

From: Kate von Krusenstiern [mailto:kvonkrus at gmail.com]
Sent: Thursday, March 15, 2018 8:44 AM
To: Maciolek, Mark <Mark.Maciolek at unh.edu>
Subject: trillian queue and runs

Caution - External Email
________________________________
Hi Mark,

I apologize if I'm just hammering a trillian issue you already know about, but I wanted to give you guys a heads up on an issue with the queue on trillian.

The queue doesn't seem to be registering finished runs, and thus not starting the runs in the queue. I noticed my run (purposely capped at 96 hours) says it's been running for 115 hours using apstat to check occupied nodes. When I checked the output of this run, it was did in fact stop computer at 96 hours.

When I use qstat to check the active batch jobs, my job is not listed as running. Qstat shows a total of 5 jobs running with 65 nodes total, different than the 11 jobs using all the nodes shown in apstat.

I know things have been busy with back to back nor'easters and spring break. I appreciate all you guys do to keep this super computer running. Hopefully this issue is something that can be resolved easily.

Thanks,
Kate von Krusenstiern



--
Kate von Krusenstiern
Center of Coastal and Ocean Mapping - University of New Hampshire
Graduate Research Assistant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.sr.unh.edu/pipermail/trillian-users/attachments/20180315/e9e7e83a/attachment.html>


More information about the Trillian-users mailing list