[Trillian-users] trillian queue and runs

Mark Maciolek Mark.Maciolek at unh.edu
Thu Mar 15 11:54:27 EDT 2018


Hi,

 

New jobs are still not starting properly, I will be rebooting the system at 12:30pm

 

Mark

 

From: Trillian-users [mailto:trillian-users-bounces at lists.sr.unh.edu] On Behalf Of Maciolek, Mark
Sent: Thursday, March 15, 2018 8:52 AM
To: 'Kate von Krusenstiern' <kvonkrus at gmail.com>
Cc: 'trillian-users at lists.sr.unh.edu' <trillian-users at lists.sr.unh.edu>
Subject: Re: [Trillian-users] trillian queue and runs

 

Kate,

 

The alps process has once again failed to work properly. I have restarted the alps process, I would recommend any user with their jobs in ‘H’ status to remove their jobs and resubmit them.

 

44538.sdb         yout.bat         samark                   0 H workq

44545.sdb         perp_D_2.pbs     kgklein                  0 H workq

44546.sdb         PBS_job_script_  salme                    0 H workq

44547.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq

44548.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq

44551.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq

44553.sdb         st.pbs           pai                      0 H workq

44556.sdb         ql.pbs           pai                      0 H workq

44558.sdb         PBS_job_script_  salme                    0 H workq

44559.sdb         L20cm            fs1036                   0 H workq

44560.sdb         PBS_job_script_  kvonkrusenstiern         0 H workq

44561.sdb         shockTest        mgorby                   0 H workq

 

If that does not succeed, I will need to reboot trillian.

 

Mark

--Mark Maciolek

Network Administrator

Morse Hall Rm 338

http://www.unh.edu/research/support-units/research-computing-center

 

From: Kate von Krusenstiern [mailto:kvonkrus at gmail.com] 
Sent: Thursday, March 15, 2018 8:44 AM
To: Maciolek, Mark <Mark.Maciolek at unh.edu <mailto:Mark.Maciolek at unh.edu> >
Subject: trillian queue and runs

 

Caution - External Email 

  _____  

Hi Mark,

 

I apologize if I'm just hammering a trillian issue you already know about, but I wanted to give you guys a heads up on an issue with the queue on trillian.

 

The queue doesn't seem to be registering finished runs, and thus not starting the runs in the queue. I noticed my run (purposely capped at 96 hours) says it's been running for 115 hours using apstat to check occupied nodes. When I checked the output of this run, it was did in fact stop computer at 96 hours. 

 

When I use qstat to check the active batch jobs, my job is not listed as running. Qstat shows a total of 5 jobs running with 65 nodes total, different than the 11 jobs using all the nodes shown in apstat.

 

I know things have been busy with back to back nor'easters and spring break. I appreciate all you guys do to keep this super computer running. Hopefully this issue is something that can be resolved easily.

 

Thanks,

Kate von Krusenstiern

 




 

-- 

Kate von Krusenstiern

Center of Coastal and Ocean Mapping - University of New Hampshire

Graduate Research Assistant

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.sr.unh.edu/pipermail/trillian-users/attachments/20180315/0464bbb1/attachment-0001.html>


More information about the Trillian-users mailing list