From cmclaughlin at fsu.edu Thu Feb 15 13:04:55 2024 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Thu, 15 Feb 2024 18:04:55 +0000 Subject: [Hpc-notice] Slow filesystem access on the HPC Message-ID: Hi HPC Users, We?ve noticed an issue with our HPC parallel storage system. Our Systems Team is working with the vendor (IBM) to resolve this as quickly as possible. There has been no data loss, but the system is very slow. We will work to resolve this as quickly as possible, and we appreciate your patience. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Fri Feb 16 09:15:11 2024 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Fri, 16 Feb 2024 14:15:11 +0000 Subject: [Hpc-notice] HPC maintenance - Sunday 8pm - 9pm; emergency storage remount Message-ID: Hi HPC Users, We will perform emergency maintenance from 8pm - 9pm on Sunday, February 18. This is so our staff can remediate the slow storage issue users have been experiencing. This is part of our ongoing efforts, in collaboration with IBM support, to fix the performance of the GPFS filesystem and ensure optimal operation of our HPC resources. Key Details: * Action: We will implement recommended configuration changes by IBM and perform a remount of the GPFS filesystem across all nodes. * Impact: All HPC compute nodes will be set to "DRAIN" state, temporarily halting the initiation of new jobs. Any running jobs during the maintenance window may run into issues, and we can not guarantee proper execution of those applications. * Timing: Sunday, February 18 8pm - 9pm, to minimize disruption to your work. What You Need to Do: * Please ensure that your work is saved, and any running jobs that can be completed are completed or checkpointed before this Sunday at 7:30pm. * Plan for a temporary pause in job submissions and computations during this maintenance window. We understand the importance of the HPC resources for your research and work, and we're committed to keeping this interruption as brief and smooth as possible. This maintenance is a crucial step towards resolving the GPFS slow performance issues and enhancing the overall efficiency of our systems. If you have any concerns or need further information, please let us know: support at rcc.fsu.edu. Thank you for your patience. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Sun Feb 18 20:56:40 2024 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Mon, 19 Feb 2024 01:56:40 +0000 Subject: [Hpc-notice] HPC maintenance complete Message-ID: Hi HPC users, The emergency maintenance is now complete. Thank you for being patient while we prepared for and conducted the remediation. We will be monitoring the system over the coming days to ensure that the measures we implemented has resolved the issue. If you encounter any issues over the coming days, please let us know: support at rcc.fsu.edu. Best regards, Casey -- Casey McLaughlin Support Coordinator | Research Computing Center Information Technology Services | Florida State University p 850.644.6270 | w its.fsu.edu/research -------------- next part -------------- An HTML attachment was scrubbed... URL: