[Zaphod-Users] Queue

George Hurtt george.hurtt at unh.edu
Tue Nov 15 08:15:32 EST 2005


(1) I tried "showstate", and got this response...

ERROR:    cannot send request to server h101:42559 (server may not be 
running)
ERROR:    cannot request service (status)

(2) Is there a way to estimate the length (time) of the cue? Our group 
has had jobs sitting there for over a day. Can each processor handle 
more than one job?

Kai Germaschewski wrote:

>On Mon, 14 Nov 2005, Fekete, Balazs M. wrote:
>
>  
>
>>Lately, I am disappointed about zaphod. I submitted two jobs around
>>noon, but neither of them got executed so far. I actually sent one of
>>the jobs to our "junk yard" (a small cluster of five old desktop PCs
>>with eight 500-800 MHz CPUs), which started immediately and finished in
>>4 hours.
>>    
>>
>
>I'm afraid zaphod being that busy is going to be the norm. Currently, 
>there are however usually a lot of ethernet-only nodes available, so you 
>may want to recompile your code without Myrinet, and it should give you a 
>much faster turnaround.
>
>  
>
>>I wonder if there is any tool to map the processor use at any time. I
>>know, http://zaphod.sr.unh.edu/ganglia/index.php but I am not sure how
>>to interpret those graphics. The gaps in the graphs are particularly
>>disturbing.
>>    
>>
>
>Yeah, I'm not sure what's wrong with ganglia at this time. "showstate" is 
>another nice command which shows you the current load.
>
>At this time, however, as you noted, zaphod is dead, and I cannot get it 
>back to life even with our remote power-cycling feature. I've no idea why 
>it died or what's going on, but this crash seems different from the other 
>recent ones in that it still responded to pings, just didn't do much else.
>
>--Kai
>
>_______________________________________________
>Zaphod-Users mailing list
>Zaphod-Users at lists.sr.unh.edu
>http://lists.sr.unh.edu/mailman/listinfo/zaphod-users
>  
>


More information about the Zaphod-Users mailing list