[Trillian-users] trillian queue and runs
Mark Maciolek
Mark.Maciolek at unh.edu
Thu Mar 15 11:54:27 EDT 2018
Hi,
New jobs are still not starting properly, I will be rebooting the system at 12:30pm
Mark
From: Trillian-users [mailto:trillian-users-bounces at lists.sr.unh.edu] On Behalf Of Maciolek, Mark
Sent: Thursday, March 15, 2018 8:52 AM
To: 'Kate von Krusenstiern' <kvonkrus at gmail.com>
Cc: 'trillian-users at lists.sr.unh.edu' <trillian-users at lists.sr.unh.edu>
Subject: Re: [Trillian-users] trillian queue and runs
Kate,
The alps process has once again failed to work properly. I have restarted the alps process, I would recommend any user with their jobs in ‘H’ status to remove their jobs and resubmit them.
44538.sdb yout.bat samark 0 H workq
44545.sdb perp_D_2.pbs kgklein 0 H workq
44546.sdb PBS_job_script_ salme 0 H workq
44547.sdb PBS_job_script_ kvonkrusenstiern 0 H workq
44548.sdb PBS_job_script_ kvonkrusenstiern 0 H workq
44551.sdb PBS_job_script_ kvonkrusenstiern 0 H workq
44553.sdb st.pbs pai 0 H workq
44556.sdb ql.pbs pai 0 H workq
44558.sdb PBS_job_script_ salme 0 H workq
44559.sdb L20cm fs1036 0 H workq
44560.sdb PBS_job_script_ kvonkrusenstiern 0 H workq
44561.sdb shockTest mgorby 0 H workq
If that does not succeed, I will need to reboot trillian.
Mark
--Mark Maciolek
Network Administrator
Morse Hall Rm 338
http://www.unh.edu/research/support-units/research-computing-center
From: Kate von Krusenstiern [mailto:kvonkrus at gmail.com]
Sent: Thursday, March 15, 2018 8:44 AM
To: Maciolek, Mark <Mark.Maciolek at unh.edu <mailto:Mark.Maciolek at unh.edu> >
Subject: trillian queue and runs
Caution - External Email
_____
Hi Mark,
I apologize if I'm just hammering a trillian issue you already know about, but I wanted to give you guys a heads up on an issue with the queue on trillian.
The queue doesn't seem to be registering finished runs, and thus not starting the runs in the queue. I noticed my run (purposely capped at 96 hours) says it's been running for 115 hours using apstat to check occupied nodes. When I checked the output of this run, it was did in fact stop computer at 96 hours.
When I use qstat to check the active batch jobs, my job is not listed as running. Qstat shows a total of 5 jobs running with 65 nodes total, different than the 11 jobs using all the nodes shown in apstat.
I know things have been busy with back to back nor'easters and spring break. I appreciate all you guys do to keep this super computer running. Hopefully this issue is something that can be resolved easily.
Thanks,
Kate von Krusenstiern
--
Kate von Krusenstiern
Center of Coastal and Ocean Mapping - University of New Hampshire
Graduate Research Assistant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.sr.unh.edu/pipermail/trillian-users/attachments/20180315/0464bbb1/attachment-0001.html>
More information about the Trillian-users
mailing list