From cmclaughlin at fsu.edu Mon Jan 2 16:39:33 2023 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Mon, 2 Jan 2023 21:39:33 +0000 Subject: [Hpc-notice] Storage System Upgrade Update: HPC back online, Spear and Globus offline Message-ID: Hi HPC Users, We are pleased to report that the following services are back online: 1. The High Performance Computing Cluster 2. Open OnDemand (https://ood.rcc.fsu.edu) 3. Customer VMs and hosted hardware The following services need some additional work; we expect to have these online early to mid week: 1. Globus services 2. Spear servers If you have any questions or concerns, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Thu Jan 5 16:34:13 2023 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Thu, 5 Jan 2023 21:34:13 +0000 Subject: [Hpc-notice] Storage System Update: Globus and Spear online and available Message-ID: Hi HPC Users, Thanks for your patience while we worked to get the Globus and Spear services back online. We are pleased to report that both are operational now. Globus users, please note that we had to change the endpoint names. If you are an existing Globus user, you will need to update your client. The names have changed as follows: * fsurcc#gpfs_home ? fsurcc#GPFS#home * fsurcc#gpfs_research ? fsurcc#GPFS#research * fsurcc#archival ? fsurcc#archival#1 * fsurcc#archival-2 ? fsurcc#archival#2 In addition, due to new security requirements from the vendor, you will need to link your RCC user credentials with your Globus account. Instructions for doing so are on our website. We are also currently experiencing a latent issue where the GPFS file system periodically freezes for a few seconds at a time. With the new storage system running over InfiniBand RDMA fabric, we have noticed long waiting processes on some nodes. There may be occasional interruptions while we work with the vendor to resolve this issue. If you have any questions or concerns, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Mon Jan 23 10:53:28 2023 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Mon, 23 Jan 2023 15:53:28 +0000 Subject: [Hpc-notice] Storage system issues affecting the HPC cluster Message-ID: Hi RCC Users, We are experiencing some issues with the GPFS storage system that are affecting the entire cluster. We are working with the vendor and will post another update as soon as the issues are resolved. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Mon Jan 23 16:10:48 2023 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Mon, 23 Jan 2023 21:10:48 +0000 Subject: [Hpc-notice] UPDATE (mostly resolved): Storage system issues affecting the HPC cluster Message-ID: Hi HPC Users, As of around 2pm, the storage system has been stable. We have been working with the vendor to identify the issue that may have caused today's outage on the GPFS filesystem. We believe the issue has been resolved , but we are now working to restore the rest of the associated services. HPC jobs that were submitted before the outage may have been affected, so we advise you to check on them and, if necessary, resubmit ones that may have failed. If you notice any further issues, please let us know: support at rcc.fsu.edu. Thanks for your patience during this unexpected outage. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: