From cmclaughlin at fsu.edu Fri Jul 10 12:11:42 2020 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Fri, 10 Jul 2020 16:11:42 +0000 Subject: [Hpc-notice] August Maintenance draft schedule released Message-ID: Hi RCC Campus Partners, Due to the situation around COVID-19, we had to reschedule our maintenance, originally planned for May, 2020 to August, 2020. The situation is changing every day, but as of today, the maintenance is still scheduled for the week of August 3 - 7. Affected services The affected services include: * all HPC and Spear services, including login nodes, parallel storage, and compute nodes, * all Research Archival volumes, * all VMs, including those that are hosted for customers Services not affected include: * Most data center hosting customers will remain online; we've already reached out and have been working with customers affected by the maintenance. Scope of work During this upgrade, we will perform upgrades to all major software on the HPC and Spear. Notable highlights include: 1. upgrade the software that powers our parallel storage system (GPFS) 2. perform hardware maintenance on the Research Archival System 3. improve our power infrastructure 4. upgrade our scheduler software, Slurm, to the latest version (v20.02 as of the time of this article) 5. reorganize part of our network configuration and update firmware on all of our switches 6. update the software on our database server 7. optimize our HPC InfiniBand network We originally reported that most services wouldn't be down for the entire week, but as we move closer to the scheduled maintenance date, we realize that is a practical improbably. We will, however, notify you if any services can resume before we anticipate. Draft schedule We plan on sending out daily notices the entire week. Also, this schedule is subject to change, but we will keep you notified if and when it does. * Friday, July 31 at 9am * We will begin draining HPC compute nodes and disable new job submissions. This means that we will configure nodes to shut off one-by-one as all the jobs on that node complete. * Monday, August 3 at 7am * We will disable access to the following systems and services: * HPC Login nodes * Spear nodes * Export nodes (GPFS and Archival storage) and Globus * Lenovo consultants will begin maintenance on the storage system software (GPFS and Archival) promptly at 7am. All users that wish to retrieve data off of the system should so by this time. * Tuesday, August 4 at 9am * Conditioned Air and Power will arrive to perform work on Power Distribution Unit "D". * Affected colocation customers have already been notified, and we are working with individual campus units to minimize impact. Nevertheless, send us a message if you have any concerns or questions. * Wednesday & Thursday, August 5 and 6 * The above work will continue. * Friday, August 7 at 5pm * We expect all systems will be back online by this time, but we will let you know if any residual issues remain. Questions or issues? If we are able to provide access to any service earlier then expected, we will do so and notify you. If you have any questions, issues, or requests, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Thu Jul 23 11:49:32 2020 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Thu, 23 Jul 2020 15:49:32 +0000 Subject: [Hpc-notice] REMINDER: System maintenance to occur the week of August 3 - 7 Message-ID: Hi RCC Campus Partners, This is a reminder that, as of the time of this message, we are planning to perform system maintenance the week of Monday, August 3 through Friday, August 7. See below for details. Affected services The affected services include: * all HPC and Spear services, including login nodes, parallel storage, and compute nodes, * all Research Archival volumes, * all VMs, including those that are hosted for customers Services not affected include: * Most data center hosting customers will remain online; we've already reached out and have been working with customers affected by the maintenance. Scope of work During this upgrade, we will perform upgrades to all major software on the HPC and Spear. Notable highlights include: 1. upgrade the software that powers our parallel storage system (GPFS) 2. perform hardware maintenance on the Research Archival System 3. improve our power infrastructure 4. upgrade our scheduler software, Slurm, to the latest version (v20.02 as of the time of this article) 5. reorganize part of our network configuration and update firmware on our switches 6. update the software on our database server 7. optimize our HPC InfiniBand network We originally reported that most services wouldn't be down for the entire week, but as we move closer to the scheduled maintenance date, we realize that is a practical improbably. We will, however, notify you if any services can resume sooner than expected. Draft schedule We plan on sending out daily notices the entire week. Also, this schedule is subject to change, but we will keep you notified if and when it does. * Friday, July 31 at 9am * We will begin draining HPC compute nodes and disable new job submissions. This means that we will configure nodes to shut off one-by-one as all the jobs on that node complete. * Monday, August 3 at 7am * We will disable access to the following systems and services: * HPC Login nodes * Spear nodes * Export nodes (GPFS and Archival storage) and Globus * Lenovo consultants will begin maintenance on the storage system software (GPFS and Archival) promptly at 7am. All users that wish to retrieve data off of the system should so by this time. * Tuesday, August 4 at 9am * Conditioned Air and Power will arrive to perform work on Power Distribution Unit "D". * Affected colocation customers have already been notified, and we are working with individual campus units to minimize impact. Nevertheless, send us a message if you have any concerns or questions. * Wednesday & Thursday, August 5 and 6 * The above work will continue. * Friday, August 7 at 5pm * We expect all systems will be back online by this time, but we will let you know if any residual issues remain. Questions or issues? If we are able to provide access to any service earlier then expected, we will do so and notify you. If you have any questions, issues, or requests, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgans at fsu.edu Fri Jul 24 09:06:06 2020 From: mgans at fsu.edu (Mitch Gans) Date: Fri, 24 Jul 2020 13:06:06 +0000 Subject: [Hpc-notice] 7/24 Tropical Storm Gonzalo Update Message-ID: Greetings, Tropical Storm Gonzalo has formed in the Atlantic, and at northeast of Venezuela, poses no immediate threat to the FSU community this weekend. We have completed checks at the Sliger Building Data Center and are prepared. We will be carefully monitoring the storm's progress in the coming days, and provide you with another update if the storm appears to be heading our way. If you have any questions or concerns, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team Mitch Gans Florida State University 2035 E. Paul Dirac Drive Tallahassee, FL 32306-2760 mgans at fsu.edu Office: (850) 644-8555 Cell: (850) 591-6193 Fax: (850) 644-8722 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pvandermark at fsu.edu Wed Jul 29 14:43:42 2020 From: pvandermark at fsu.edu (Paul Van Der Mark) Date: Wed, 29 Jul 2020 18:43:42 +0000 Subject: [Hpc-notice] REMINDER: System maintenance to occur the week of August 3 - 7 In-Reply-To: References: Message-ID: Dear RCC partners, There is the possibility of a storm coming towards Tallahassee. Although we plan to continue with the system maintenance next week, we heard from Lenovo that, in the worst case, the company will not allow technicians to travel to Tallahassee. In that scenario, we will have to move our maintenance by one week to Monday, August 10 through Friday, August 14. This Friday we will have a better picture of the development of potential tropical cyclone #9 and if Lenovo allows a technician to travel to Tallahassee. Best regards, The RCC Team ________________________________ From: hpc-staff on behalf of Casey Mc Laughlin via hpc-staff Sent: Thursday, July 23, 2020 11:49 AM To: JESfwd-hpc-notice Cc: Casey Mc Laughlin ; JESfwd-hpc-staff Subject: [hpc-staff] REMINDER: System maintenance to occur the week of August 3 - 7 Hi RCC Campus Partners, This is a reminder that, as of the time of this message, we are planning to perform system maintenance the week of Monday, August 3 through Friday, August 7. See below for details. Affected services The affected services include: * all HPC and Spear services, including login nodes, parallel storage, and compute nodes, * all Research Archival volumes, * all VMs, including those that are hosted for customers Services not affected include: * Most data center hosting customers will remain online; we've already reached out and have been working with customers affected by the maintenance. Scope of work During this upgrade, we will perform upgrades to all major software on the HPC and Spear. Notable highlights include: 1. upgrade the software that powers our parallel storage system (GPFS) 2. perform hardware maintenance on the Research Archival System 3. improve our power infrastructure 4. upgrade our scheduler software, Slurm, to the latest version (v20.02 as of the time of this article) 5. reorganize part of our network configuration and update firmware on our switches 6. update the software on our database server 7. optimize our HPC InfiniBand network We originally reported that most services wouldn't be down for the entire week, but as we move closer to the scheduled maintenance date, we realize that is a practical improbably. We will, however, notify you if any services can resume sooner than expected. Draft schedule We plan on sending out daily notices the entire week. Also, this schedule is subject to change, but we will keep you notified if and when it does. * Friday, July 31 at 9am * We will begin draining HPC compute nodes and disable new job submissions. This means that we will configure nodes to shut off one-by-one as all the jobs on that node complete. * Monday, August 3 at 7am * We will disable access to the following systems and services: * HPC Login nodes * Spear nodes * Export nodes (GPFS and Archival storage) and Globus * Lenovo consultants will begin maintenance on the storage system software (GPFS and Archival) promptly at 7am. All users that wish to retrieve data off of the system should so by this time. * Tuesday, August 4 at 9am * Conditioned Air and Power will arrive to perform work on Power Distribution Unit "D". * Affected colocation customers have already been notified, and we are working with individual campus units to minimize impact. Nevertheless, send us a message if you have any concerns or questions. * Wednesday & Thursday, August 5 and 6 * The above work will continue. * Friday, August 7 at 5pm * We expect all systems will be back online by this time, but we will let you know if any residual issues remain. Questions or issues? If we are able to provide access to any service earlier then expected, we will do so and notify you. If you have any questions, issues, or requests, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgans at fsu.edu Fri Jul 31 08:40:39 2020 From: mgans at fsu.edu (Mitch Gans) Date: Fri, 31 Jul 2020 12:40:39 +0000 Subject: [Hpc-notice] 7/31 Hurricane Isaias Update Message-ID: Greetings, Hurricane Isaias in the Atlantic is approaching the Bahamas, but NOAA forecast models at this time do not predict a threat to the FSU community. We have completed checks at the Sliger Building Data Center and are prepared. We will be carefully monitoring the storm's progress in the coming days, and provide you with another update if the storm appears to be heading our way. If you have any questions or concerns, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team Mitch Gans Florida State University 2035 E. Paul Dirac Drive Tallahassee, FL 32306-2760 mgans at fsu.edu Office: (850) 644-8555 Cell: (850) 591-6193 Fax: (850) 644-8722 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Fri Jul 31 12:11:09 2020 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Fri, 31 Jul 2020 16:11:09 +0000 Subject: [Hpc-notice] REMINDER: Annual hardware maintenance starting MONDAY, August 3 Message-ID: <6D262E7B-68BB-4330-8400-7238D69939D8@fsu.edu> ? Hi RCC Users, We are moving ahead with plans for annual maintenance to occur next week (Aug 3 - 7). If we need to reschedule it for the following week, we will send out another message today or over the weekend. As such, we have disabled job submissions on the HPC. This will prevent you from submitting new jobs, but will not affect currently running jobs. We will turn the cluster completely off no later than 7:30am Monday morning, which will effectively kill all running jobs. For more details about the maintenance and schedule, refer to this news announcement: https://fla.st/3gDpaA4 If you have any questions or issues, please let us know by emailing support at rcc.fsu.edu. Best regards, The RCC Team -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-bugf4wlt.png Type: image/png Size: 585 bytes Desc: Outlook-bugf4wlt.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-u5vbaan0.png Type: image/png Size: 594 bytes Desc: Outlook-u5vbaan0.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-uqwlpiul.png Type: image/png Size: 437 bytes Desc: Outlook-uqwlpiul.png URL: