------------------------------------------------------------------------
TL;DR:
- ECC does not seem to work, although the components are capable of
ECC. Can anyone with a Xeon or ECC-capable Pentium on a Fujitsu
D3417(-B2) acknowledge or disprove this phenomenon? Or share their
knowledge on how to prove that ECC functionality actually works? - Is there a publicly accessible bug tracker for Fujitsu products?
I assembled a new server using the following components:
Board: Fujitsu D3417-B2
CPU: Intel Pentium G4560T (Kaby Lake, ECC-capable)
RAM: 1 x Samsung 8 GB DDR4, unbuffered, ECC
According to Intel ARK[1], the CPU is capable of ECC:
ECC Memory Supported: Yes
Reading the ECCDIS bit[2] in the CAPID0 register confirms this:
Code: Select all
## Check for plausibility of register values
# setpci -s 00:00.0 0.w ## Offset 0: 2 Bytes Vendor ID. Default: 8086h
8086
# setpci -s 00:00.0 e4.l ## Offset E4h: Capabilities A (CAPID0). Default: 0h
60012061
## 60012061h = 1110 0000 0000 0001 0010 0000 0110 0001b
## ==> Bit 25 (ECCDIS) = 0 ==> ECC capable
Although the CPU supports ECC, there is not the slightest hint that
this kind of memory protection gets detected and actually used. There
is no way of configuring anything ECC related (background scrubbing
rate, chip kill etc.); the word "ECC" is not even mentioned anywhere
in the BIOS setup. According to the BIOS manual, there should be an
item called "Runtime Error Logging", right between "CPU Configuration"
and "Drive Configuration", but there is none to be seen (using current
BIOS version 1.8.0).
Alas, the appropriate Linux EDAC kernel modules for Intel only work
with Xeons, and the Intel documentation on this subject is unwieldy,
to say the least.
So to be sure that memory protection actually works, I unceremoniously
electrically isolated pins 1-3 (NC, VCC (present multiple times), DQ4)
of a DIMM socket using a thin piece of plastic foil (naturally
observing ESD protection measures), re-inserted the DIMM carefully,
checked for correct placement of the isolation, and powered up the
system.
I expected:
- The machine starts up,
- the ECC mechanism corrects the 1-bit-error,
- there is at least 1 unambiguous entry in the Smbios event log (in
BIOS setup, under "Event Logs") for this artificially created and
automatically correctable 1-bit-error, and - the Linux log files are flooded by MCE messages (Machine Check Exception)
Instead:
- The monitor stays black, and the machine emits a continuous sequence of
beeps of medium length, and - after removing the piece of plastic foil, there is no additional
entry in the Smbios event log.
To me, this looks as if the ECC support on this board is dysfunctional.
Possible Causes:
- BIOS bug: ECC support only gets used within Xeon family, not
universally with ECC-capable CPUs. - Intel's data on the web as well as on silicon are wrong, and the
Pentium G4560T does not support ECC. - The CPU is broken.
- The RAM is broken.
- The mainboard ist broken.
- Everything works as expected. Execution needs to reach BIOS first,
which then has to activate ECC support to make ECC work at all.
Can anyone with a Xeon or ECC-capable Pentium on a Fujitsu D3417(-B2)
acknowledge or disprove this phenomenon? Or share their knowledge on
how to prove that ECC functionality actually works?
Additionally: Is there a publicly accessible bug tracker for Fujitsu
products?
Regards
peterlistig
[1] http://ark.intel.com/products/97465/Int ... e-2_90-GHz
[2] https://www.intel.com/content/www/us/en ... vol-2.html - chapter 3.39, page 97
[3] ftp://ftp.ts.fujitsu.com/pub/Mainboard- ... top_UK.pdf