[Trillian-users] need to restart trillian

Mark Maciolek, 862-3050 maciolek at unh.edu
Tue Sep 17 13:27:45 EDT 2013


On 9/17/2013 8:07 AM, Mark Maciolek, 862-3050 wrote:
> On 9/16/2013 4:14 PM, Mark Maciolek, 862-3050 wrote:
>> hi,
>>
>> We had an issue this morning where we lost our chilled water for about
>> 10 minutes. This caused the Air conditioner for the Cray to report an
>> issue that caused us to reboot it. About a hour later our hourly check
>> on trillian nodes reported one complete cabinet down c2, we have tried
>> every trick in the Cray book to bring it back online and they have all
>> failed.
>>
>> We need to completely shutdown the Cray, all three cabinets and then
>> bring it all back online. This will take about a hour, without it you
>> are missing 44 compute nodes.
>>
>> We plan on doing this at 7am on Tuesday September 17, unless you want it
>> done sooner.
>>
>> Mark
>>
> hi,
>
> Cray had us open a new case for this issue, so the restart did not
> happen. Will update as soon as we here back from Cray.
>
> Mark
>
hi,

All 132 nodes are available again other than the 7 all ready in use.

Mark

-- 
Mark Maciolek <mailto:Mark.Maciolek at unh.edu>
Network Administrator
Research Computing & Instrumentation
<http://www.unh.edu/research/support-units/research-computing-instrumentation>


More information about the Trillian-users mailing list