[Zaphod-Users] node problems?

tom fogal tfogal at apollo.sr.unh.edu
Wed Nov 9 13:35:29 EST 2005


I'm constantly getting the following error within 10 minutes of a job
submission, usually within a few seconds:

mpiexec: Error: wait_tasks: tm_poll remote: tm: system error.

I checked all the nodes and it appears m145 is down.  I tried a simple
reboot to no avail; looks like it might be getting stuck booting
somewhere, because the myrinet card never gets to the 'green' ready
state from the 'blinking orange' initialization state.

I'm going to temporarily remove m145 from the list of nodes PBS knows
about, so I can get my job going... assuming I can remember / find
documentation on what file that is stored in.

-tom


More information about the Zaphod-Users mailing list