[Zaphod-Users] qdel cannot kil1 jobs on the failed nodes

Saeid Jalali s_jalali_a at yahoo.com
Wed May 23 10:01:13 EDT 2007


    Dear  all,
  The same problem is occurred, and I cannot kill the following crashed jobs.
  qdel 41074
  qdel: Server could not connect to MOM 41074.h101.cl.unh.edu
  qdel 41075
  qdel: Server could not connect to MOM 41075.h101.cl.unh.edu
      The  last one was killed by Tod, and I appreciate him. But, it seems that this is a  frequently occurred problem. What can we do to avoid putting in trouble the administrators  frequently to kill our crashed jobs? If for any reasons some nodes running a  job fail, e.g., 2 nodes out of 5 requested nodes fail, we cannot do qdel the  job. Thus in such a case the other nodes, e.g., here in the later example 3  nodes, are still involved for solving a crashed job. In this case it should be existed  a way to make free the involved nodes. Unfortunately, qdel command does not  work in this case, and I appreciate if you let the users of Zaphod know another  way to get ride of such a sever problem. 
  
  Thank you in advance for your contributions,
  Your,
  S. Jalali.
    
 
---------------------------------
Don't be flakey. Get Yahoo! Mail for Mobile and 
always stay connected to friends.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.sr.unh.edu/pipermail/zaphod-users/attachments/20070523/1c699aab/attachment.html 


More information about the Zaphod-Users mailing list