From pvandermark at fsu.edu Tue Jun 3 11:16:53 2025 From: pvandermark at fsu.edu (Paul Van Der Mark) Date: Tue, 3 Jun 2025 15:16:53 +0000 Subject: [Hpc-notice] FSU Network Upgrade Information Message-ID: Dear RCC users, Technology Services (ITS) is performing a network upgrade. Upgrades will occur from 1-5AM ET on Thursday, June 5, at the Northwest Regional Data Center (NWRDC) and from 1-5AM ET on Thursday, June 12, at the Roderick K. Shaw Building (RSB). During the upgrade, the network may experience short outages, and individuals and devices connected to the network may temporarily lose access. For more information about the network upgrade, contact the ITS Service Desk at 850-644-4357 or its.fsu.edu/help. This has the potential to impact all network connections to RCC resources, and you may experience a disconnection from the login nodes or Open OnDemand server. However, it will not affect any running jobs on our cluster, as they are on an internal network. Best, Paul -- Paul van der Mark, PhD Director, Research Computing Center Information Technology Services Florida State University Phone: 850.644.0193 its.fsu.edu | rcc.fsu.edu https://fsu.zoom.us/my/pvandermark -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Sat Jun 21 13:43:27 2025 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Sat, 21 Jun 2025 17:43:27 +0000 Subject: [Hpc-notice] Parallel storage issues Message-ID: <4A9E07B0-FAF5-4AFD-86EC-955377A7920D@fsu.edu> Hi HPC users? We are currently experiencing issues with our parallel storage system on the HPC (GPFS). Our systems team has been notified. We will provide updates as soon as we have more information. Thanks for your patience. Best regards, The RCC Team From cmclaughlin at fsu.edu Sat Jun 21 16:58:09 2025 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Sat, 21 Jun 2025 20:58:09 +0000 Subject: [Hpc-notice] HPC is back online; please resubmit your jobs Message-ID: Hi HPC users, Earlier today, we sent out a notice that our parallel storage system on the HPC (GPFS) was experiencing issues. Our Systems Team diagnosed and remediated the issue. The InfiniBand subnet manager experienced a critical issue that disrupted the network fabric management. This resulted in a full network outage across the cluster. We've recovered the subnet manager and services are being restored. Unfortunately, the disruption caused most jobs to fail, so you will need to resubmit any jobs you were running at your earliest convenience. We apologize for the disruption. Please let us know if we can assist by emailing support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: