[Hpc-notice] HPC is back online; please resubmit your jobs

Casey Mc Laughlin cmclaughlin at fsu.edu
Sat Jun 21 16:58:09 EDT 2025


Hi HPC users,

Earlier today, we sent out a notice that our parallel storage system on the HPC (GPFS) was experiencing issues. Our Systems Team diagnosed and remediated the issue.

The InfiniBand subnet manager experienced a critical issue that disrupted the network fabric management. This resulted in a full network outage across the cluster. We've recovered the subnet manager and services are being restored.

Unfortunately, the disruption caused most jobs to fail, so you will need to resubmit any jobs you were running at your earliest convenience. We apologize for the disruption.

Please let us know if we can assist by emailing support at rcc.fsu.edu<mailto:support at rcc.fsu.edu>.

Best regards,
The RCC Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20250621/c3d090db/attachment.html>


More information about the Hpc-notice mailing list