[Hpc-notice] HPC maintenance - Sunday 8pm - 9pm; emergency storage remount

Casey Mc Laughlin cmclaughlin at fsu.edu
Fri Feb 16 09:15:11 EST 2024


Hi HPC Users,

We will perform emergency maintenance from 8pm - 9pm on Sunday, February 18. This is so our staff can remediate the slow storage issue users have been experiencing. This is part of our ongoing efforts, in collaboration with IBM support, to fix the performance of the GPFS filesystem and ensure optimal operation of our HPC resources.

Key Details:


  *
Action: We will implement recommended configuration changes by IBM and perform a remount of the GPFS filesystem across all nodes.
  *
Impact: All HPC compute nodes will be set to "DRAIN" state, temporarily halting the initiation of new jobs. Any running jobs during the maintenance window may run into issues, and we can not guarantee proper execution of those applications.
  *   Timing: Sunday, February 18 8pm - 9pm, to minimize disruption to your work.

What You Need to Do:


  *   Please ensure that your work is saved, and any running jobs that can be completed are completed or checkpointed before this Sunday at 7:30pm.
  *   Plan for a temporary pause in job submissions and computations during this maintenance window.

We understand the importance of the HPC resources for your research and work, and we're committed to keeping this interruption as brief and smooth as possible. This maintenance is a crucial step towards resolving the GPFS slow performance issues and enhancing the overall efficiency of our systems.

If you have any concerns or need further information, please let us know: support at rcc.fsu.edu.

Thank you for your patience.

Best regards,
The RCC Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20240216/f87dc24f/attachment.html>


More information about the Hpc-notice mailing list