[Hpc-notice] Partial HPC cluster failure
Casey Mc Laughlin
cmclaughlin at fsu.edu
Wed Oct 2 15:42:48 EDT 2019
Hi RCC Users,
We are experiencing an issue with a power distribution unit for several racks in the HPC. Running jobs are affected on the following racks:
1. M32
2. I29
3. I30
4. I31
5. I32
6. I35
7. I36
Jobs in the following partitions are affected:
* backfill
* backfill2
* changlani_q
* coaps18_q
* eoas19_q
* fraser_q
* genacc_q
* hongli_q
* ktaylor_q
* mecfd18_q
* medicine_q
* quicktest
* rcc_internal
* sec4m_q
* stagg_q
* stata_q
* stroupe_q
* yin19_q
In addition, the InfiniBand switch is down, so jobs in other partitions may be affected as well.
The Systems Team has been deployed and we hope to have this issue resolved soon.
In the meantime, you can get status updates at: https://fla.st/2oysWFq and direct inquiries to support at rcc.fsu.edu.
Best Regards,
The RCC Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fsu.edu/pipermail/hpc-notice/attachments/20191002/fe71cadc/attachment.html>
More information about the Hpc-notice
mailing list