[premise-users] Premise Lustre update
Robert Anderson
rea at sr.unh.edu
Mon May 16 18:51:42 EDT 2022
The Premise HPC Lustre storage resync command that started on Sunday
completed around noontime today.
We spent the afternoon trying to resolve a mysql database error that is
occurring on the Lustre Management node. A single uncommitted transaction
keeps the database from coming up. All attempts to rollback that
transaction have failed. We have backed up the mysql file system storage,
and also dumped the data for all tables from within mysql. We're currently
reloading all of the msql data back into a clean database, which could take
a few hours. Once restored we expect mysql will start normally and also
contain the prior data necessary to continue the Lustre boot process.
It's possible that we will run into a new issue after this restore is
complete. But our current roadblock is mysql, and it's data appears to be
intact, so we are still optimistic. There is a chance that the server will
be up later tonight if all goes well. If we do run into a new problem later
this evening we will hold off and start something new when we are fresh on
Tuesday morning.
At some future point we may have to consider bringing Premise back up
without the old Lustre storage. All of the users with home directories on
BeeGFS and not using Anaconda would still be able to use the system without
the old Lustre. We are not yet to that point, and will continue working
towards 100% functionality.
Thanks for your patience.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.sr.unh.edu/pipermail/premise-users/attachments/20220516/be429fd2/attachment.html>
More information about the premise-users
mailing list