[Zaphod-Users] qdel cannot kil1 jobs on the failed nodes
Saeid Jalali
s_jalali_a at yahoo.com
Wed May 23 10:01:13 EDT 2007
Dear all,
The same problem is occurred, and I cannot kill the following crashed jobs.
qdel 41074
qdel: Server could not connect to MOM 41074.h101.cl.unh.edu
qdel 41075
qdel: Server could not connect to MOM 41075.h101.cl.unh.edu
The last one was killed by Tod, and I appreciate him. But, it seems that this is a frequently occurred problem. What can we do to avoid putting in trouble the administrators frequently to kill our crashed jobs? If for any reasons some nodes running a job fail, e.g., 2 nodes out of 5 requested nodes fail, we cannot do qdel the job. Thus in such a case the other nodes, e.g., here in the later example 3 nodes, are still involved for solving a crashed job. In this case it should be existed a way to make free the involved nodes. Unfortunately, qdel command does not work in this case, and I appreciate if you let the users of Zaphod know another way to get ride of such a sever problem.
Thank you in advance for your contributions,
Your,
S. Jalali.
---------------------------------
Don't be flakey. Get Yahoo! Mail for Mobile and
always stay connected to friends.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.sr.unh.edu/pipermail/zaphod-users/attachments/20070523/1c699aab/attachment.html
More information about the Zaphod-Users
mailing list