[Zaphod-Users] Queue

Kai Germaschewski kai.germaschewski at unh.edu
Mon Nov 14 23:53:12 EST 2005


On Mon, 14 Nov 2005, Fekete, Balazs M. wrote:

> Lately, I am disappointed about zaphod. I submitted two jobs around
> noon, but neither of them got executed so far. I actually sent one of
> the jobs to our "junk yard" (a small cluster of five old desktop PCs
> with eight 500-800 MHz CPUs), which started immediately and finished in
> 4 hours.

I'm afraid zaphod being that busy is going to be the norm. Currently, 
there are however usually a lot of ethernet-only nodes available, so you 
may want to recompile your code without Myrinet, and it should give you a 
much faster turnaround.

> I wonder if there is any tool to map the processor use at any time. I
> know, http://zaphod.sr.unh.edu/ganglia/index.php but I am not sure how
> to interpret those graphics. The gaps in the graphs are particularly
> disturbing.

Yeah, I'm not sure what's wrong with ganglia at this time. "showstate" is 
another nice command which shows you the current load.

At this time, however, as you noted, zaphod is dead, and I cannot get it 
back to life even with our remote power-cycling feature. I've no idea why 
it died or what's going on, but this crash seems different from the other 
recent ones in that it still responded to pings, just didn't do much else.

--Kai



More information about the Zaphod-Users mailing list