[Hpc-notice] 2021 Software upgrade August 16 - 20

Casey Mc Laughlin cmclaughlin at fsu.edu
Wed Jul 14 10:38:55 EDT 2021


Hi HPC & Spear users,


We are planning an upgrade to the HPC software stack during the week beginning August 16. Unlike previous software upgrades, we will perform a rolling upgrade in order to minimize downtime. This means that parts of the cluster will remain online while other parts are being upgraded.

Schedule

We plan on performing the maintenance during the week of August 16 - 20. We will send out an email and update the news article on our website<https://rcc.fsu.edu/news/2021-software-upgrade-august-16-20> when we have a detailed draft schedule prepared (est 2-3 weeks).

What's New

We are making the following major changes to our software stack to increase stability, fix bugs, and enhance the usability of the HPC:

  1.  Upgrade CentOS from v7 to v8.3
  2.  Upgrade Slurm from v20.02 to v20.11 (release notes<https://slurm.schedmd.com/news.html>)
  3.  Upgrade software packages to newer versions (full list forthcoming)
  4.  Upgrade our Open OnDemand web portal from v1.7 to v2.0 (release notes<https://osc.github.io/ood-documentation/latest/release-notes/v2.0-release-notes.html>)
  5.  Make a major change in how we install new user software packages, like NetCDF and R. (details below)

Change to The upgrade process

The process for upgrading the cluster will require us to re-install the operating system on every compute node. To facilitate minimal downtime during the week of the upgrade, we will reinstall only small sets of nodes at a time, while keeping most nodes online. Once nodes have been reinstalled, they will immediately be put back into production and the next set of nodes will be reinstalled.


Since we will have to update all nodes, jobs will get killed periodically throughout the week. This also means that for a short time, the cluster will consist of two sets of nodes running two different software stacks, the old CentOS7 and the new CentOS8 builds. Users with more complicated scripts and workflows may want to wait until the upgrade is complete before submitting jobs, especially if your jobs utilize multiple queues/partitions.


The Login Nodes will be available, and the Slurm job scheduler will accept jobs throughout the upgrade. Also, access to our storage systems, GPFS and Archival, will remain online throughout the upgrade process.

Changes to the HPC software stack

We are improving the way that we organize the libraries under a standard scheme. We already support multiple versions of some software on the HPC (e.g. "R" and MATLAB), but the folder structure has not been consistent and can lead to confusion.


We encourage users to rely on environment modules<https://rcc.fsu.edu/docs/linux-modules> when possible, which RCC staff keep up to date with the correct paths for libraries. If your code requires loading libraries for compilation, we encourage you to use the pkg-config tool<https://rcc.fsu.edu/software/pkg-config> instead of specifying the full path in your Makefiles or scripts. Refer to the pkg-config documentation<https://rcc.fsu.edu/software/pkg-config> on our website for more details. pkg-config is available on the HPC now, so you can refactor your custom scripts right away.

Questions?

If you have any questions or comments, please reach out to us: support at rcc.fsu.edu<mailto:support at rcc.fsu.edu>.

Best regards,
The RCC Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20210714/9b67120b/attachment.html>


More information about the Hpc-notice mailing list