[Trillian-users] scheduling issue is back

Mark Maciolek Mark.Maciolek at unh.edu
Tue Jul 25 12:06:09 EDT 2017


Hi,

Can users pai and fs1036 verify that their jobs are running properly?

The alps scheduling issue has returned after the latest reboot on Thursday
July 20th. 

fopen(/var/log/alps/apsched20170725) failed (Too many open files)
2017-07-25 10:49:23: Switching pid 7906 to /var/log/alps/apsched20170725
2017-07-25 10:49:23: fopen(/var/log/alps/apsched20170725) failed (Too many
open files)

I restarted the alps process it the number of open files is climbing, slowly
currently at 33 and max is 1024.
When working properly it never has more than 15 files open.

Mark

Total placed applications: 5
  Apid ResId   User  PEs Nodes     Age State   Command
127733     7    pai 1728    54 115h42m   run   p3d.out
127735     9 fs1036  120    15 114h47m   run pStreamer
127737    10 fs1036  120    15 114h45m   run pStreamer
127739    11 fs1036   64     2 114h43m   run pStreamer
127741    12 fs1036   64     2 111h02m   run pStreamer

qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
36540.sdb         st.pbs           pai               00:00:48 R workq
36561.sdb         3Dtest           fs1036            00:00:06 R workq
36562.sdb         3Dtest           fs1036            00:00:07 R workq
36563.sdb         L20cm            fs1036            00:00:09 R workq
36564.sdb         L20cm            fs1036            00:00:09 R workq
36577.sdb         dc-1c6472.qsub   dcramer                  0 H workq
36579.sdb         dc-f67973.qsub   dcramer                  0 H workq
36583.sdb         plasmasphere_rc  jobejen                  0 H workq
36586.sdb         ulf1.qsub        dcramer                  0 H workq
36615.sdb         asym             jonngwx                  0 H workq
36616.sdb         L20cm            fs1036                   0 H workq
36617.sdb         yhout.bat        samark                   0 H workq
36618.sdb         y.MC.MB02        jxyu                     0 H workq



More information about the Trillian-users mailing list