RX2540M5 Server crashing/powering off with Reason: "'PSU' HSC: An FET health fault has been detected"
We're running in a bad Situation with migrating from our old RX4770M1 Production-Servers to 14 NEW RX2540M5 Servers, for virtualization (vmware vSphere).
After doing this (1-2 weeks later) first Server failed. In detail: The Server (in production with many VMs hosting) powered off itself! Why? iRMC-Log told us the Reason from above. FTS-Support told us it`s a known issue and they are working on it, also they said a Workaround is to unplug the failed Server for a few seconds and then it won't fail anymore. So we done this but a few days later (2-3) the Server failed with the same error again. 2 days ago another one failed with this KNWON Issue! Within nearly two weeks 2 out of 14 NEW Servers failed with a already known Issue!
Today FTS-Support and Engineering told us (expressed in a simple way) the reason is faulty handling with CPU-States and a BIOS-Update is planned for this month (11/19).
It isn't a problem with additional I/O-Cards or something, we've just two 10GBE Nics and a HBA installed, likewise from FTS .
Why isn't FTS alerting their customers?! Nobody can use such Servers for production!
Anyone else with this Problems?