[Trillian-users] trillian issue

Mark Maciolek Mark.Maciolek at unh.edu
Thu Dec 6 14:37:46 EST 2018


Hi,

Any new jobs are going on hold or not queued. At 3:30pm I am going to cancel
any jobs running in the current que

qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
52224.sdb         trace_1          yabingwang        00:01:34 R workq
52225.sdb         trace_2          yabingwang        00:01:15 R workq
52228.sdb         trace_4          yabingwang        00:01:30 R workq
52232.sdb         trace_6          yabingwang        00:01:38 R workq
52237.sdb         trace8           yabingwang        00:01:37 R workq
52240.sdb         trace_12         yabingwang        00:01:20 R workq
52340.sdb         PBS_job_script_  salme             00:00:03 R workq
52361.sdb         trace_415        yabingwang        00:00:50 R workq
52367.sdb         2DC1.5E1.5       jhk1009           00:00:04 R workq
52368.sdb         yhout.bat        samark            00:00:01 R workq
52369.sdb         yhout.bat        samark            00:00:01 R workq

Then I will restart pbs again.

If that fails to allow new jobs to start, I will schedule a reboot of
trillian for 4PM.

Mark



-----Original Message-----
From: Trillian-users <trillian-users-bounces at lists.sr.unh.edu> On Behalf Of
Maciolek, Mark
Sent: Thursday, December 6, 2018 10:27 AM
To: 'trillian-users at lists.sr.unh.edu' <trillian-users at lists.sr.unh.edu>
Subject: [Trillian-users] trillian issue

Hi,

Jobs submitted on trillian are being queued and not running since yesterday
morning.

qstat -s shows this reason

Not Running: Insufficient amount of resource arch

The only logs are from PBS mom_logs showing this:

20181205:12/05/2018 08:00:01;0080;pbs_mom;Node;alps_engine_query;ALPS ENGINE
query failed with BASIL version 1.1.
20181205:12/05/2018 08:00:01;0002;pbs_mom;Node;alps_inventory;ALPS inventory
request failed.
20181205:12/05/2018 08:10:01;0080;pbs_mom;Node;alps_engine_query;ALPS ENGINE
query failed with BASIL version 1.1.
20181205:12/05/2018 08:10:01;0002;pbs_mom;Node;alps_inventory;ALPS inventory
request failed.

The alps logs don't show any obvious issue. 

I restarted pbs on trillian, which had the effect of cancelling some jobs
and restarting others.

Will keep an eye on the logs for now.

Mark

--Mark Maciolek
Network Administrator
Morse Hall Rm 338
http://www.unh.edu/research/support-units/research-computing-center


_______________________________________________
Trillian-users mailing list
Trillian-users at lists.sr.unh.edu
https://lists.sr.unh.edu/mailman/listinfo/trillian-users




More information about the Trillian-users mailing list