[Hpc-notice] Sliger Friday Afternoon Update: HPC partially online, FSU & RCC will monitor Data Center temps over the weekend.

Casey Mc Laughlin cmclaughlin at fsu.edu
Fri Jun 4 17:10:50 EDT 2021


Hi RCC Users,



Here is the last update before the weekend (hopefully 🤞).



On the HPC, all owner nodes and GPU nodes are online, and they are processing jobs submitted via Slurm. However, part of the cluster will remain offline through the weekend, including most of the general access nodes.



Jobs submitted through the backfill2 queue will run, but they may have to wait longer than usual before they start. We are asking for your patience over the weekend, and we will hopefully be able to bring more of the cluster up on Monday.



Chilled water cooling has been the hold-up for most of the week. The contractors have one of the two primary chillers online and fully tested. However, we do not believe they will have the other one online by the end of the day. We will continue to run temporary cooling along with the chiller over the weekend for some redundancy.



With only one of the two chillers running, there is no backup in case of a failure. The HPC nodes can heat up to room to near-dangerous levels very quickly, and we don't want to take any risks during this weekend. We will, therefore, not bring the whole cluster up until we have reassurance that both chillers are functional.


Refer to the image below to see some of the wild temperature swings in the datacenter this past few days:



[cid:79c2debc-24e7-4281-aef8-90d3e03ee9f9]



If the contractors can get the second cooling unit online by early next week, we will power up the full HPC as early as Monday. Either way, you can expect another message from the RCC Staff no later than 2pm on Monday.



Have a good weekend, and once again, thanks very much for your patience. As usual, comments and inquiries should be sent via email to support at rcc.fsu.edu<mailto:support at rcc.fsu.edu>.



Best regards,

The RCC Team

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20210604/866f9c41/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-shiifppf.png
Type: image/png
Size: 152214 bytes
Desc: Outlook-shiifppf.png
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20210604/866f9c41/attachment.png>


More information about the Hpc-notice mailing list