[Hpc-notice] Reminder - HPC and Spear maintenance to occur NEXT WEEK - May 6 - May 12
Casey Mc Laughlin
cmclaughlin at fsu.edu
Wed May 1 15:01:09 EDT 2019
Hi RCC Users,
Starting on Monday at 7am, we will perform maintenance on our HPC and Spear clusters. During this time, the HPC and Spear will be unavailable. We will also be performing brief maintenance on our storage system.
The maintenance window will begin on Monday, May 6 at 7am and last for one week. All systems will be back online no later than Monday, May 13 at 9am.
Additionally, our storage systems (GPFS and Archival) will be offline from Monday, May 6 at 7am until 12pm (approx 4-5 hours). We will send a notice out as soon as the storage systems are available.
Some systems may be available earlier. We have timed the upgrade to occur between academic semesters in the hope of minimizing potential impact on research activities.
What we are doing
The 2019 software upgrade will allow us to accomplish the following:
* Upgrade over 500 software packages to new versions (list and details<https://rcc.fsu.edu/news/software-upgrade-coming-in-may>)
* Upgrade the Slurm scheduler to Version 18.08 (release notes<https://slurm.schedmd.com/news.html>)
* Run new benchmarks<https://rcc.fsu.edu/doc/hpc-benchmarks> on the HPC and post results in our website
* Upgrade the hardware network configuration on portions of the HPC cluster
* Perform critical storage system maintenance activities
Services Affected
* GPFS and Archival storage will be unavailable briefly on Monday, May 6 from 9am until no later than 12pm.
* HPC and Spear will remain offline all week until Monday, May 13 at 9am.
On Monday, we will perform brief maintenance to our Archival and GPFS storage systems. We expect to have these services back online very quickly. You will be able to read and write data via Globus<https://rcc.fsu.edu/doc/globus> and SFTP/RSYNC<https://rcc.fsu.edu/doc/data-transfer> for the remainder of the maintenance period.
The "SKY" VM cluster will not be affected and will remain online throughout the maintenance period.
Tentative Schedule
* Friday, May 3 - 9am
* We will begin draining HPC compute nodes.
* Sunday, May 5 - 5pm
* We will disable HPC job submission sin Slurm. The cluster will stop accepting new jobs at this time. Already-running jobs will continue to run.
* Monday, May 6 - 7am - MAINTENANCE BEGINS
* We will disable access to the following systems:
* HPC Login nodes
* Spear nodes
* Export nodes (GPFS and Archival storage)
* We will turn off and rebuild HPC login nodes and compute nodes. Any jobs running at this time will be cancelled.
* Monday, May 6 - 12pm (or earlier)
* We will restore access to to the Export Nodes (GPFS and Archival storage)
* Saturday, May 4 - 9am
* We will run benchmarks and tests on the HPC and Spear
* Monday, May 13 - 9am
* HPC and Spear will be back online.
If we are able to provide access to any service early, we will do so and notify RCC users.
Summary
We will publish updates and schedule changes as we get closer to the maintenance window. In the meantime, we appreciate your patience and support. If you have any questions, issues, or requests, please let us know: support at rcc.fsu.edu<https://rcc.fsu.edu/support>.
Best regards,
The RCC Team
Research Computing Center
Information Technology Services | Florida State University
w rcc.its.fsu.edu<https://rcc.fsu.edu>
<https://twitter.com/floridastateITS>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20190501/579a6c6e/attachment.html>
More information about the Hpc-notice
mailing list