[Trillian-users] scheduling issue is back
Mark Maciolek
Mark.Maciolek at unh.edu
Tue Jul 25 12:06:09 EDT 2017
Hi,
Can users pai and fs1036 verify that their jobs are running properly?
The alps scheduling issue has returned after the latest reboot on Thursday
July 20th.
fopen(/var/log/alps/apsched20170725) failed (Too many open files)
2017-07-25 10:49:23: Switching pid 7906 to /var/log/alps/apsched20170725
2017-07-25 10:49:23: fopen(/var/log/alps/apsched20170725) failed (Too many
open files)
I restarted the alps process it the number of open files is climbing, slowly
currently at 33 and max is 1024.
When working properly it never has more than 15 files open.
Mark
Total placed applications: 5
Apid ResId User PEs Nodes Age State Command
127733 7 pai 1728 54 115h42m run p3d.out
127735 9 fs1036 120 15 114h47m run pStreamer
127737 10 fs1036 120 15 114h45m run pStreamer
127739 11 fs1036 64 2 114h43m run pStreamer
127741 12 fs1036 64 2 111h02m run pStreamer
qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
36540.sdb st.pbs pai 00:00:48 R workq
36561.sdb 3Dtest fs1036 00:00:06 R workq
36562.sdb 3Dtest fs1036 00:00:07 R workq
36563.sdb L20cm fs1036 00:00:09 R workq
36564.sdb L20cm fs1036 00:00:09 R workq
36577.sdb dc-1c6472.qsub dcramer 0 H workq
36579.sdb dc-f67973.qsub dcramer 0 H workq
36583.sdb plasmasphere_rc jobejen 0 H workq
36586.sdb ulf1.qsub dcramer 0 H workq
36615.sdb asym jonngwx 0 H workq
36616.sdb L20cm fs1036 0 H workq
36617.sdb yhout.bat samark 0 H workq
36618.sdb y.MC.MB02 jxyu 0 H workq
More information about the Trillian-users
mailing list