[premise-users] Premise Lustre update

Robert Anderson rea at sr.unh.edu
Mon May 16 18:51:42 EDT 2022


The Premise HPC Lustre storage resync command that started on Sunday 
completed around noontime today.

We spent the afternoon trying to resolve a mysql database error that is 
occurring on the Lustre Management node. A single uncommitted transaction 
keeps the database from coming up. All attempts to rollback that 
transaction have failed. We have backed up the mysql file system storage, 
and also dumped the data for all tables from within mysql. We're currently 
reloading all of the msql data back into a clean database, which could take 
a few hours. Once restored we expect mysql will start normally and also 
contain the prior data necessary to continue the Lustre boot process.

It's possible that we will run into a new issue after this restore is 
complete. But our current roadblock is mysql, and it's data appears to be 
intact, so we are still optimistic. There is a chance that the server will 
be up later tonight if all goes well. If we do run into a new problem later 
this evening we will hold off and start something new when we are fresh on 
Tuesday morning.


At some future point we may have to consider bringing Premise back up 
without the old Lustre storage. All of the users with home directories on 
BeeGFS and not using Anaconda would still be able to use the system without 
the old Lustre. We are not yet to that point, and will continue working 
towards 100% functionality.


Thanks for your patience.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.sr.unh.edu/pipermail/premise-users/attachments/20220516/be429fd2/attachment.html>


More information about the premise-users mailing list