From pvandermark at fsu.edu Wed Apr 3 08:58:45 2019 From: pvandermark at fsu.edu (Paul Van Der Mark) Date: Wed, 3 Apr 2019 12:58:45 +0000 Subject: [Hpc-notice] Network issues archival system Message-ID: <1554296324.16456.65.camel@fsu.edu> Dear RCC users, We are having a minor network issue with our archival system: the system is up and running, but not accessible through it's virtual IP. We hope to fix the issue this morning. Our apologies for any inconveniences. Best, Paul From cmclaughlin at fsu.edu Wed Apr 3 12:49:52 2019 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Wed, 3 Apr 2019 16:49:52 +0000 Subject: [Hpc-notice] HPC and Spear upgrade to occur May 6 - May 12 Message-ID: This May, we will perform periodic maintenance on our HPC and Spear clusters. During this time, the HPC and Spear will be unavailable. We will also be performing brief maintenance on our storage system. The maintenance window will begin on Monday, May 6 at 7am and last for one week. All systems will be back online no later than Monday, May 13 at 9am. Some systems may be available earlier. We have timed the upgrade to occur between academic semesters in the hope of minimizing potential impact on research activities. What we are doing The 2019 software upgrade will allow us to accomplish the following: * Upgrade over 500 software packages to new versions (list and details) * Upgrade the Slurm scheduler to Version 18.08 (release notes) * Run new benchmarks on the HPC and post results in our website * Upgrade the hardware network configuration on portions of the HPC cluster * Perform critical storage system maintenance activities Services Affected * GPFS and Archival storage will be unavailable briefly on Monday, May 6 from 9am until no later than 12pm. * HPC and Spear will remain offline all week until Monday, May 13 at 9am. On Monday, we will perform brief maintenance to our Archival and GPFS storage systems. We expect to have these services back online very quickly. You will be able to read and write data via Globus and SFTP/RSYNC for the remainder of the maintenance period. The "SKY" VM cluster will not be affected and will remain online throughout the maintenance period. Tentative Schedule * Friday, May 3 - 9am * We will begin draining HPC compute nodes. * Sunday, May 5 - 5pm * We will disable HPC job submission sin Slurm. The cluster will stop accepting new jobs at this time. Already-running jobs will continue to run. * Monday, May 6 - 7am - MAINTENANCE BEGINS * We will disable access to the following systems: * HPC Login nodes * Spear nodes * Export nodes (GPFS and Archival storage) * We will turn off and rebuild HPC login nodes and compute nodes. Any jobs running at this time will be cancelled. * Monday, May 6 - 12pm (or earlier) * We will restore access to to the Export Nodes (GPFS and Archival storage) * Saturday, May 4 - 9am * We will run benchmarks and tests on the HPC and Spear * Monday, May 13 - 9am * HPC and Spear will be back online. If we are able to provide access to any service early, we will do so and notify RCC users. Summary We will publish updates and schedule changes as we get closer to the maintenance window. In the meantime, we appreciate your patience and support. If you have any questions, issues, or requests, please let us know: support at rcc.fsu.edu. Best regards, The RCC Team Research Computing Center Information Technology Services | Florida State University w rcc.its.fsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmclaughlin at fsu.edu Thu Apr 4 10:45:28 2019 From: cmclaughlin at fsu.edu (Casey Mc Laughlin) Date: Thu, 4 Apr 2019 14:45:28 +0000 Subject: [Hpc-notice] Archival storage online and available Message-ID: Hi RCC users, We have patched the Archival Storage system, and you should now be able to read and write to it normally. Thank you for your patience while we work on the issue. If you have any issues, please let us know (support at rcc.fsu.edu). Best, The RCC Team Research Computing Center Information Technology Services | Florida State University w rcc.its.fsu.edu [cid:1e078326-5b47-4f53-8e9c-0b1f78de91e1] [cid:da21015f-95e2-4314-a768-b532a64a7884] [cid:f2a191af-d41f-4d8e-acf2-ef1ff2116aca] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-hizaeaio.png Type: image/png Size: 437 bytes Desc: Outlook-hizaeaio.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-e2b1gxie.png Type: image/png Size: 585 bytes Desc: Outlook-e2b1gxie.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-3mbvhbbc.png Type: image/png Size: 594 bytes Desc: Outlook-3mbvhbbc.png URL: From pvandermark at fsu.edu Mon Apr 15 11:53:49 2019 From: pvandermark at fsu.edu (Paul Van Der Mark) Date: Mon, 15 Apr 2019 15:53:49 +0000 Subject: [Hpc-notice] Slow login nodes during the weekend Message-ID: <1555343628.28297.9.camel@fsu.edu> Dear RCC users, During this weekend we found that our login servers had become rather unresponsive. This was caused by a large number of user processes. We understand that it sometimes might be easier to run a small test on the login nodes, but keep in mind that these are shared resources and anything you do will impact the work of other users. - Please use our cluster for jobs that take more than 4 cores or run longer than an hour. - Running an rsync for data movement can be taxing on our login nodes too, we suggest you userun rsync to our export.rcc.fsu.edu node and/or use globus. - We are working on our job submission script generator, so you can submit jobs directly from our website. In the mean time, please have a look at https://rcc.fsu.edu/submit-script-generator Don't hesitate to ask for help with your slurm job submission scripts, we are here to help you. Thank you, Paul From pvandermark at fsu.edu Fri Apr 19 11:27:19 2019 From: pvandermark at fsu.edu (Paul Van Der Mark) Date: Fri, 19 Apr 2019 15:27:19 +0000 Subject: [Hpc-notice] login node issues Message-ID: <1555687638.28297.71.camel@fsu.edu> Dear HPC users, We are currently experiencing some issues with some of our login nodes and are actively working on it. We might have to reboot some of the "vm" login nodes during this period; we will send a console message 10 minutes before, so you will be able to save any files you have open. Thank you, Paul From pvandermark at fsu.edu Fri Apr 19 12:02:08 2019 From: pvandermark at fsu.edu (Paul Van Der Mark) Date: Fri, 19 Apr 2019 16:02:08 +0000 Subject: [Hpc-notice] login node issues In-Reply-To: <1555687638.28297.71.camel@fsu.edu> References: <1555687638.28297.71.camel@fsu.edu> Message-ID: <1555689728.28297.72.camel@fsu.edu> We also seem to have issues with our ticketing system. We will try to get both the ticketing system and some of the login nodes fixed as soon as possible. On Fri, 2019-04-19 at 15:27 +0000, Paul Van Der Mark via Hpc-notice wrote: > Dear HPC users, > > We are currently experiencing some issues with some of our login > nodes > and are actively working on it. We might have to reboot some of the > "vm" login nodes during this period; we will send a console message > 10 > minutes before, so you will be able to save any files you have open. > > Thank you, > Paul > > _______________________________________________ > You received this message, because you have an account with the FSU > Research Computing Center > More information: http://rcc.fsu.edu/connect > > ** More News: http://rcc.fsu.edu/news > ** Facebook: http://facebook.com/fsurcc > ** Twitter: http://twitter.com/fsurcc