[premise-users] Premise downtime scheduled for Monday 12/16 through Wednesday 12/18

Robert Anderson rea at sr.unh.edu
Thu Dec 19 09:41:15 EST 2019


The Premise HPC reconfiguration is not yet complete.

The largest remaining task is to validate that all of the user data was 
successfully moved. All user data has been migrated from Lustre to BeeGFS 
storage. We are confirming it is an exact copy. This process started 
yesterday and many group areas have been scanned and confirmed to be exact 
copies.

The last remaing system task is to reimage the old Premise head/login node 
to be just a login node.  This was started this morning and is not expected 
to take long to complete.

RCC believes that user data is the most important part of Premise and we 
want to ensure an accurate copy has been made. Given the size of the data 
being checked this process could drag out for some groups.

At this point we are going to keep Premise offline until the number of 
unconfirmed group data is smaller.

In the meantime we will try to determine a way to safely allow the checked 
groups access to Premise.  Worst case could be Monday morning, but another 
email update will be sent this evening.

Sorry for the extra downtime.


On November 12, 2019 16:42:27 Robert Anderson <rea at sr.unh.edu> wrote:
> The Premise HPC cluster is in need of a scheduled downtime for 
> reconfiguration, migration to the new storage, and general system upgrades. 
> We hope that by scheduling a month out people can work around these dates, 
> and that Premise will be ready for the many jobs expected during the long 
> holiday break.
>
> Our plan is to shutdown first thing Monday morning 12/16. The main upgrades 
> will occur on Monday. We will then work to migrate as much data as possible 
> from Lustre to the new BeeGFS storage. We have moved over the data multiple 
> times, but it gets immediately stale with every job output you run.
>
> Given the size of the data storage on Premise there is little chance we can 
> complete all the migrations within a three day window. We will start with 
> the smallest groups and provide detailed status update for the larger 
> groups on the 18th.
>
> You can help to complete the storage migration by:
>
> 1. Cleaning up anything currently stored on Premise that you do not need. 
> This would be a great chance to ensure the 2nd copy of your data is complete.
>
> 2. "STATIC" If you store a lot of data on Premise please let us know the 
> areas that we can copy now that will NOT have to be updated later. If you 
> have any large datasets that need to be moved but will not change please 
> provide the path to them so we can copy them in the weeks before 12/14 and 
> NOT attempt to update them during this short window. If in doubt provide 
> the path, as we are not reformatting the old storage immediately, so it 
> will be available to us for awhile AFTER this planned storage migration window.
>
> 3. "CRITICAL" On the other hand if you have critical area that you really 
> need, please provide the path(s) so that we can ensure it is moved during 
> the scheduled 3 day window. This only makes sense if your group has 
> multiple TB of storage on Premise, smaller groups (<5TB) do not need to 
> specify their critical areas, since we will have time to easily move your 
> entire group area.
>
> Please email question or responses to #2 "STATIC" & #3 "CRITICAL" above to: 
> RCCOPS at sr.unh.edu
>
> Thanks for your cooperation.
> --
> Robert Anderson <rea at sr.unh.edu>
> UNH RCC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.sr.unh.edu/pipermail/premise-users/attachments/20191219/4b6bab98/attachment.html>


More information about the premise-users mailing list