Overclock.net banner
1 - 20 of 37 Posts

infinitypoint

· Astronomer/Photographer
Joined
·
32 Posts
Discussion starter · #1 ·
A bit of background: I have 2 pairs of the G.Skill Trident Z 3200 DDR4 CL14 memory kits (F4-3200C14D-32GTZR) in 4x16 GB setup. However when I run the HCI Memtest on them both pairs (even with each pair mounted separately) fail at some point. One of them fails much later than other: running 10 instances of HCI Memtest one pair finds an error pretty early on (<100% coverage) while the other one is anywhere between 370% to 700% (I tested twice). Furthermore these were tested using stock CPU settings, so no overclock enabled. That means everything is on auto including DRAM voltage, VCCIO, and VCCSA. XMP profile is enabled of course, which boosts the DRAM voltage up to 1.35 V.



So I'm wondering if either of those voltages can affect RAM stability?


I'm currently testing the "better" pair of RAM using HCI Memtest in another machine, I'll update here when that's finished.



For reference here is my setup:

i7-8086K Stock settings
ASRock Fatal1ty Gaming K6 Z370
G.Skill Trident Z 4x16 GB CL14 RGB (F4-3200C14D-32GTZR)
ThermalTake 240mm AIO liquid cooler

Let me know if anyone needs any more info. Thanks!
 
1.35v is the default voltage most factory overclocked RAM kits run at so nothing to worry about there.
4X16GB GB of 3200Mhz at CL14 is a lot of fast RAM that the IMC has to handle. Actually, its the max supported by this socket so you may face some issues trying to get those stable.
Try setting VCCSA and VCCIO to 1.20 - 1.25v and DRAM voltage to 1.37v and see if you manage to pass HCI.
 
Discussion starter · #3 ·
Thanks I will give that a try and let you know the result! On a related note, what is your opinion on the max VCCIO and VCCSA voltage for good longevity of the system? I saw someone comment 1.2 V for VCCIO is the absolute max. Interestingly, the VCCIO voltage in my BIOS turns red at 1.15 and above, and only goes up to 1.2 (if I remember correctly). I did try increasing DRAM voltage to 1.36 V before but to no improvement, so I will give 1.37 V a try.



Interim update on the HCI test on the other machine, so far so good, at 1300% coverage or so with 0 errors. This is with all stock settings (it's my HTPC lol).
 
Hi,
1.35 is pretty low on xpm on auto would be a minimum voltage.
How are you reading the voltage I'm sure hwinfo would show a highest voltage close to 1.38-1.39v.
 
Discussion starter · #5 ·
Ok here is the update [2nd time typing this because my session timed out...]:


1) I tested the "better" pair of RAM sticks in my other machine (a HTPC). The motherboard used a default voltage of 1.384 V on HWinfo and everything passed up to 1238% coverage over 2 instances (its only a dual core) on HCI.


2) I tested the "worse" pair of RAM in my original machine using only 2 instances of HCI, at a DRAM voltage setting of 1.37 V (don't remember what the actual HWinfo voltage was), VCCIO=1.21 V, and VCCSA=1.264 V (the last 2 are HWinfo readings, and using auto settings on the motherboard). This test passed up to around 1000% coverage as well.


3) I put all 4 sticks of RAM in my original machine at 1.384 V, VCCIO=1.21 V, and VCCSA=1.264 V, and the test failed twice (using 10 and 8 instances). Interestingly enough they both failed at the same percentage (72% coverage) for one of the instances in each run.


I'm now at a loss...what can be the problem now?? Is it possible the motherboard is defective? Help!!


Also....what exactly does "coverage" mean for these tests? I assume even though I can only assign ~3 GB per instance of the program, times 10 instances = 30 GB, it's somehow covering the entire 64 GB I have.



Note:
"Better" pair of RAM refers to the 2 sticks in which when I test them on my original machine, the errors occurred much later (370-700%) coverage.
"Worse" pair of RAM refers to the 2 sticks in which the errors occurred much sooner, typically <100% coverage.
 
Either garbage motherboard or IMC in your CPU or sticks require some timing tweaks when paired in four. If I understand correctly, you have two different packs of dual RAM sticks, so the third option is most likely.
 
Hi,
vccsa/ system agent seems high but I'm not versed in your platform.
 
Discussion starter · #8 · (Edited)
@BroadPwns I don't think it's the CPU because I've tested it with 2 different CPUs, both at stock clocks (sorry should've mentioned that earlier). You are right about the fact that they are 2 different pairs of memory, ie it came in 2 packs of 2 each instead of 1 pack of 4, so they aren't as "compatible" with each other. But, as I mentioned earlier, I still get errors even when I test each pair separately in my computer...so I think that rules out option 3. I suppose a defective motherboard could also be the problem...I've already had to replace it once for a different, but possibly related reason. Basically sometimes when I turn the computer on it would turn on for a split second, then turn off. It repeats this 2-3 times total, and then turns on with my BIOS settings reset (because it failed to boot). I replaced that defective board with my current one which solved the problem, and is the board I've been running all my tests on.

@ThrashZone yes even I feel like that's a bit excessive, but that's the "auto" settings. Seems like a common thread for many boards to pump way too much voltage than needed. But I figured I would try it with auto since it's higher than what @Cryptedvick suggested for VCCSA/VCCIO voltages. I also don't understand why when I set the VCCIO voltage above 1.15 V it turns red in the BIOS, but auto settings set it at 1.2 V.


I just received another set of identical RAM and will be testing those tonight...will update on here those results.
 
Discussion starter · #10 ·
Ok so here's an update.


Slapped all 4x16 GB new sticks in, and pretty much got insta-fails anywhere between 20-60% coverage, using 10 threads and DRAM voltage at 1.35 V, and no effect from changing VCCSA or VCCIO. Tried booting into safe mode, running fewer instances (ie 6), and a few other DRAM voltages up to 1.39 V to the same effect.


By that point I was somewhat convinced it was a motherboard issue, since safe mode excludes any possible driver issues (which supposedly is unusual/rare). So I then take out 2 of the sticks and test the other 2, and reset all voltages back to nominal (DRAM=1.358 V, VCCIO = 1.144 V, VCCSA = 1.216, all readings from HWinfo). The crazy thing is, running 10 HCI instances (each allocated 2900 MB = 29 GB tested total, out of 32 GB) and after ~17 hours I've reached 1500% coverage with no errors. Again this was one of the new pairs of RAM I received yesterday. I've never reached this much coverage on this many instances of HCI before. Even more mysterious is why the other RAM pairs have issues but this one apparently does not? Previously with another RAM pair (just testing that pair) I got up to ~680% coverage before 1 error showed up, and testing it again an error showed up at 370% coverage (both using 10 instances of HCI).



I've read another thread on here about how HCI can hammer the CPU cache pretty hard which can produce false positives. And also that above ~45 C it's possible to get random errors. My RAM has been running at 49 C and 47.7 C average temps the whole time.


I'll be running Google stress app test next, now that I finally got WSL/Ubuntu running and updated.
 
Discussion starter · #11 ·
Ok Just finished a 2 hour run with GSAT with a solid pass, log with command is attached below.



So I guess I'm pretty confident this set of memory is good. Now I'm going to see what happens when I insert the "good" pair of RAM from my first set (see my above posts for an explanation), and use the exact same voltages.



Log: Commandline - stressapptest -M 29000 -s 7200 -W -C 12 -l GSATlog_20180919_1830.log --stop_on_errors
Stats: SAT revision 1.0.6_autoconf, 64 bit binary
Log: buildd @ lgw01-amd64-022 on Thu Apr 5 10:28:35 UTC 2018 from open source release
Log: 1 nodes, 12 cpus.
Log: Defaulting to 12 copy threads
Log: Prefer plain malloc memory allocation.
Log: Using memaligned allocation at 0x7f42ef7f1000.
Stats: Starting SAT, 29000M, 7200 seconds
Log: Region mask: 0x1
Log: Seconds remaining: 7189
...
Stats: Found 0 hardware incidents
Stats: Completed: 10221266.00M in 7199.75s 1419.67MB/s, with 0 hardware incidents, 0 errors
Stats: Memory Copy: 10221266.00M at 1419.77MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: PASS - please verify no corrected errors
 
I'm thinking this is more and more likely to be an IMC issue. As in, 4x16GB is the max for the IMC and its also 3200Mhz at CL14. Its just not guaranteed to run at those clocks + timings with the max amount of RAM, no matter what kits you get. Its all down to the IMC really.

You could try to manually set timings to 15.15.15 with VCCIO and VCCSA at 1.2-1.25v first, 1.38v dram and see if they pass.
If they do, its pretty much clear that the IMC just can't take that much ram at that speed (which is not rated for BTW).
 
Discussion starter · #13 · (Edited)
@Cryptedvick you may be right. HCI already failed with the 4 modules at about 26%. I closed out of that and I'm now running another 2 hour GSAT test. So far about 45 min in and no issues (yet).


I did find this post from another thread to be interesting:
https://www.overclock.net/forum/180...1-memory/1644432-great-new-memory-stability-tester-ram-test-4.html#post26524004


Basically he uses Ram Test to get VCCIO and VCCSA voltages, and HCI to find a good cache voltage. The latter part got me thinking, what if I modify my cache voltage, and/or cache multiplier speed? Cache speed is easy to change, but apparently CPU Vcore and cache voltage have to be the same on my board?


Also interesting someone thinks HCI can create false positives from a lot of stress on the CPU cache:
https://www.overclock.net/forum/180...1-memory/1644432-great-new-memory-stability-tester-ram-test-5.html#post26527747


Just purchased Ram Test so I'm going to run that now...


Also GSAT just finished a 2 hours pass on all 4 modules:
Log: Commandline - stressapptest -M 61000 -s 7200 -W -C 12 -l GSATlog_20180919_1830.log --stop_on_errors
Stats: SAT revision 1.0.6_autoconf, 64 bit binary
Log: buildd @ lgw01-amd64-022 on Thu Apr 5 10:28:35 UTC 2018 from open source release
Log: 1 nodes, 12 cpus.
Log: Defaulting to 12 copy threads
Log: Prefer plain malloc memory allocation.
Log: Using memaligned allocation at 0x7f98fb7f1000.
Stats: Starting SAT, 61000M, 7200 seconds
Log: Region mask: 0x1
Log: Seconds remaining: 7190
...
Log: Seconds remaining: 10
Stats: Found 0 hardware incidents
Stats: Completed: 9881748.00M in 7199.93s 1372.48MB/s, with 0 hardware incidents, 0 errors
Stats: Memory Copy: 9881748.00M at 1372.58MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: PASS - please verify no corrected errors
 
when I loaded my AMD FX chip with 4 sticks of fast ddr3 ram, I was forced to increase tREF ram timing to stabilize this.


Do you have this timing option to change? Its considered a "secondary" or "subtiming" I think on most motherboards.




I maxxed mine out, then later on, slowly tweaked some voltages to allow me to drop it back down to what was stable with 2 sticks.


Some motherboards might just be generating too much noise to properly run 4 sticks at once, an indication of your nearly instant fails with 4, but much longer tests with only 2.
 
Discussion starter · #15 · (Edited)
@Cryptedvick it worked! OK so when I first used RAM Test I was getting errors at 200-300%. I tried tweaking the VCCIO and VCCSA voltages around to no effect. Then I tried relaxing my timings to 15-15-15-35 and made it to ~5700% in RAM Test after ~7.5 hours. Personally that's good enough for me. So then I tightened it a little to 14-15-15-34 and it's been running stable on RAM Test up to 7200%! Since I'm currently waiting for my delidded CPU to ship back, I'm using another temporary CPU for the moment. Hopefully the RAM will remain stable after I overclock it (8086K) to 5.1 GHz (I can already reach that speed, but the thermals are off the charts with the stock Intel pigeon poop :D).


@mattliston that's very interesting, I will give that a try when I get a chance. I'm pretty sure my board can change the tREF time, there are tons of timing options in my BIOS. Edit: I have an option for tREFI under third timing which is the refresh cycle at an average periodic interval, with values between 1-65535.
 

Attachments

Hi,
WOW your temps are pretty messed up for a little 4.3 especially your dimm temps hitting 58c :/
 
So I'm wondering if either of those voltages can affect RAM stability?
Yes.

The rest, you're pushing the IMC to it's limits. Your temps do seem high, I've tortured my RAM at max allowed voltages (DDR3 1.9V) and it goes nowhere near your finger burning temperatures the software reports.
 
Hi,
WOW your temps are pretty messed up for a little 4.3 especially your dimm temps hitting 58c :/

If that is at 4.3GHz i would like to see it at 5.1GHz.

My Ram: G Skill Ripjaws 3000MHz Overclocked at 3400MHz 1.344V - VCCSA 1.18V and VCCIO 1.20V XMP profile disabled and the max temps are not that bad.
 

Attachments

If that is at 4.3GHz i would like to see it at 5.1GHz.

My Ram: G Skill Ripjaws 3000MHz Overclocked at 3400MHz 1.344V - VCCSA 1.18V and VCCIO 1.20V XMP profile disabled and the max temps are not that bad.
Please don't spread inacurate info. That screen shot just shows your idle temps on the rams after a few seconds of opening HWinfo. Do a 9 hour torture test on your rams and then get back with a screen shot. Even then, there are multiple variables that could influence ram temp. From ambient temp, to case size, nr of fans, fans layout, CPU cooler type (AIO vs air) module cooling, etc).

@infinitypoint, temps are resonable for such an extensive torture test, so dont worry about it.
Out of curiosity, what was your ambient temp during the test?
What kind of case are you using?
How many fans do you have and what config are they set to?
Does your AIO cooler pull air into the case or does it blow it out?
 
Discussion starter · #20 ·
@Cryptedvick Ok yeah I figured. Ambient temps that day reached somewhere between 81-85 F, so it was pretty hot. I left my window open for the (little) breezes there were that day. I'm using a pretty old case, an Antec 1200 v3 I believe. I have a total of 9 fans on the case: 3 front 120mm intake (HDDs/SSDs are mounted in the front), 1 side 120mm intake over GPU, 4 in push-pull exhaust on CPU radiator (240mm), and one top 200mm exhaust fan. Case sits under my desk (it's a full tower, so can't put on desk haha).


@JackCY the CPU was running at 100% usage for 9 hours with a high ambient temp, which is why the CPU temp was so high :). Today the ambient temp was around 80F average and my DIMMs' average temps are 40-42C under normal use conditions.
 
1 - 20 of 37 Posts