Overclock.net banner
1 - 20 of 46 Posts

Ichirou

· Registered
Joined
·
8,745 Posts
Discussion starter · #1 ·
In a thorough TM5 test like with 1usmus' config, it's set to do three cycles of varied tests, which (for me) last about 20 mins each with short breaks inbetween each.

I've tightened many timings where they would get through 1-2 cycles just fine, but then throw an error after that. Usually a single error, but sometimes a wall of them.

In the case of a single late error, does it mean that the timing is bad, or is it just that the RAM is overheating?

I actually ordered a RAM fan about a week ago, which is yet to arrive. I'm sure I'll find out then, but I'd just like to know whether cooling the RAM actually does anything.
 
  • Rep+
Reactions: Outis
But which error is it?
 
Discussion starter · #4 ·
Is this the v3 config? How much ram do you have? 32GB?

Usually yes, an error in a later test cycle either is trfc related or temperature related. Which test is the error stopping at?
But which error is it?
Yeah, V3. 4x16 GB sticks. The errors are varied, so there's no one pattern to it. I think I see 5 and 8 most often though.
I have tRFC at auto for the time being since I'm well aware that it and tREFI are really sensitive to heat.
 
In a thorough TM5 test like with 1usmus' config, it's set to do three cycles of varied tests, which (for me) last about 20 mins each with short breaks inbetween each.

I've tightened many timings where they would get through 1-2 cycles just fine, but then throw an error after that. Usually a single error, but sometimes a wall of them.

In the case of a single late error, does it mean that the timing is bad, or is it just that the RAM is overheating?

I actually ordered a RAM fan about a week ago, which is yet to arrive. I'm sure I'll find out then, but I'd just like to know whether cooling the RAM actually does anything.
Would help if you posted the actual timings and voltages used, or even the hardware used.

1 single error later in the test, with RAM sticks running cool and nothing else is usually VCCSA or some subtiming related.
errors that start slowly increasing as the dimms heat up are temp related (which can relate to a number of things).
 
Discussion starter · #6 · (Edited)
Would help if you posted the actual timings and voltages used, or even the hardware used.

1 single error later in the test, with RAM sticks running cool and nothing else is usually VCCSA or some subtiming related.
errors that start slowly increasing as the dimms heat up are temp related (which can relate to a number of things).
Ah yeah, sorry about that. The CPU is an i7-8086k @ 5 GHz all-core, 0 AVX offset, 4.7 GHz cache, 1.295V. RAM are default 4,000 MHz @ 18-22-22-42.

2461815
2461816


Those are safe timings right now, and tested quite thoroughly (a dozen TM5 passes without errors).
I get issues if I try to tighten further or deviate too much from those tRFC/tREFI settings though.

Also, if I try to boost my CPU core clock to 5.1 or 5.2, I need 1.4V Vcore to stabilize TM5 for some reason. Not sure what's going on there.
 
when u do 4 dimms.. things always get complicated...going 4 with dual ranks.. thats just asking for abuse..

your error could be anything... vccio, vcssa, third timing, training, temps, even cpu vcore, cache...
 
Discussion starter · #9 ·
Is tCKE = 0 because you disabled Powerdown mode with PPD = 0? If so, are you doing that in bios or with memtweakit?
I'm actually on an Intel, so I don't have a power down mode like AMD does. Or at least, I don't see one in the BIOS.
Hence, the tCKE setting is done in the BIOS. I started off with 4, then 1, then 0. RAM didn't seem to mind.

I actually corrupted my OS when I set tRCD+tRP to 20 (even though it passed TM5 twice), so I had to reset my PC.
After the reset, MemTweakIt stopped working with some error at launch saying something about "driver initialization failed." So I have no access to real-time timing tweaking anymore. Already tried reinstalling it several times.

Back when MemTweakIt did work, I tried setting tXP lower than 8, but to no avail. However, the results weren't certain since I think there may have been instability elsewhere. Hence, tXP will remain stuck at 8. My BIOS doesn't have an tXP setting exposed. It seems to be for higher end motherboard models only.

In other news, I had to revert my BIOS back to before I raised the core clock above 5 GHz, since I was getting TM5 errors even on stable settings.
I think it may have been because my RAM didn't flush completely. I had to completely unplug the power cord to clear it. TM5 passes again now.

Thus, I'm currently still using the settings as shown in the screenshots, although I lowered tREFI to 32237 just to be safe.
I'll test some other timings later and then try pushing the CPU clocks higher again. I actually haven't tried lowering Command Rate yet, so I'll try it.

Still waiting on my RAM fan to arrive in order to retest timings that gave late errors.
It's strange, because those particular timings basically never throw an error in the first 1-2 cycles. It's almost always the same time in a later cycle, albeit with different error codes. Since tRFC is at a safe setting, I can only imagine that it's a temperature problem.
 
So, what are your DIMM temperatures?
 
Discussion starter · #11 ·
So, what are your DIMM temperatures?
I actually have no idea, and HWiNFO doesn't have a reading on it, so I assume the sticks don't report it.
I don't have any actual meters or anything to measure either. Not even a thermometer, which is what somebody suggested before lol

I can only imagine that, since Corsair always sells high frequency 1.5V kits with a fan included, they get hot even at their stock speeds.
Since my kit is just a baseline 4,000 MHz at 1.35V stock, it did not come with a fan. I did order one off of Amazon though which should hopefully arrive soon.
 
Discussion starter · #12 · (Edited)
So I decided to tighten tRDRD and tWRRD by 1 (I've already tested these two before and found inconclusive results; they sometimes passed, sometimes didn't), and did another TM5 test. I was AFK for the most part, but after the test completed, it gave me one error in test 4.

According to 1usmus's config file, Test 4 is function "MirrorMove128" with parameter "510". Is that indicative of a temperature issue?
The log file unfortunately doesn't show a timestamp, but I'm about 70% confident it's a late error anyway.

2461897


Update: I experimented with some things, and I noticed that from time to time, TM5 may pass, but then throw errors after the test.
Should those be ignored? Are those errors happening because the program is winding down in an unnatural way after TM5 interrupts it?
Also, I noticed that if I just cut it down to two cycles instead of three, I pass a lot more. I imagine there is a temperature issue involved.
 
Discussion starter · #14 ·

This google docs sheet has explanations about what an error in each of the tests mean. Should help in troubleshooting.
Thank you for the reference. It took me a moment to figure out what you were referring to, but then I noticed the TM5 Error cell has a drop-down of each test.
This will be very useful.

I updated my previous post, but if you missed it, I've decided to reduce 1usmus' test to only two cycles, since the majority of errors are thrown in Cycle 3 and not really the first two.
Since the errors are varied and never consistently during one particular test, my interpretation is that it is a RAM temperature issue and not so much timing instability.

For ordinary use, is that safe? Or do I have to pass three cycles?
I've even had instances where past safe settings (ones that pass all the time) would occasionally throw errors in Cycle 3 without any changes made.
 
The easiest way to check whether this is temperature related is to either raise the temperature of the case or reduce it. You can raise it by running a GPU stress test(like in AIDA or OCCT) while the ram test runs. If there is instability due to temperature, you'll notice it right away.

To reduce the temperature, you can point a fan at the fans. Any fan works as long as its blowing air on the ram.
 
@Ichirou,

Not saying your problem is not temperature related, just curious as to why you believe it might be.

Are your other temperatures, especially ambient, exceedingly high?
 
Discussion starter · #17 ·
The easiest way to check whether this is temperature related is to either raise the temperature of the case or reduce it. You can raise it by running a GPU stress test(like in AIDA or OCCT) while the ram test runs. If there is instability due to temperature, you'll notice it right away.

To reduce the temperature, you can point a fan at the fans. Any fan works as long as its blowing air on the ram.
@Ichirou,

Not saying your problem is not temperature related, just curious as to why you believe it might be.

Are your other temperatures, especially ambient, exceedingly high?
Okay, I did a test run with a small USB fan blowing directly on the RAM, and it seems to have passed a full test (with all of my uncertain timings put in as well). I haven't tested it more than once though.

I run the RAM in a mATX case with pretty low airflow. Since the errors (usually only one) mostly occur in the third cycle (or sometimes even never), I can only imagine that it's a temperature issue.

TM5 with 1usmus' config repeats the same cycle three times with breaks in between, so it doesn't make sense that it can pass a few cycles but fail a later one, since the instructions are the exact same. Hence, I figured it was the RAM heatsinks failing to completely dissipate the heat by the time the third cycle commences.
 
Okay, I did a test run with a small USB fan blowing directly on the RAM, and it seems to have passed a full test (with all of my uncertain timings put in as well). I haven't tested it more than once though.

I run the RAM in a mATX case with pretty low airflow. Since the errors (usually only one) mostly occur in the third cycle (or sometimes even never), I can only imagine that it's a temperature issue.

TM5 with 1usmus' config repeats the same cycle three times with breaks in between, so it doesn't make sense that it can pass a few cycles but fail a later one, since the instructions are the exact same. Hence, I figured it was the RAM heatsinks failing to completely dissipate the heat by the time the third cycle commences.
OK, I was not trying to be nosy, just curious, because you never stated your actual environmental temperatures.

Good to hear you have it solved.

Thanks
 
Discussion starter · #19 · (Edited)
OK, I was not trying to be nosy, just curious, because you never stated your actual environmental temperatures.

Good to hear you have it solved.

Thanks
Oh, you weren't nosy at all. It just never occurred to me that I should've tried something so simple.
Of course, one test isn't enough to prove anything, but it's a step in the right direction.

I'll keep tightening some more timings from here, but there isn't much more for me to tighten. Any suggestions?

Command Rate @ 1 wouldn't post, so that's that.
I could try lowering tRCD and tRAS again, but I risk OS corruption. Maybe I'll try it once my RAM fan arrives. Just maybe.
I should try to drop the tertiary _dr timings as low as possible since my RAM doesn't use them. I'll also try tightening the tertiaries further, but I think they won't post.
There's also of course tRFC and tREFI, but I'll leave that for after my RAM fan arrives.
I already tried tweaking the RTLs and IOLs, but they pretty much refuse to budge. I'll try again later.

2462003


On a side note, what exactly happens when tRFC/tREFI becomes unstable? Does it corrupt the OS?
 
Oh, you weren't nosy at all. It just never occurred to me that I should've tried something so simple.
Of course, one test isn't enough to prove anything, but it's a step in the right direction.

I'll keep tightening some more timings from here, but there isn't much more for me to tighten. Any suggestions?

Command Rate @ 1 wouldn't post, so that's that.
I could try lowering tRCD and tRAS again, but I risk OS corruption. Maybe I'll try it once my RAM fan arrives. Just maybe.
I should try to drop the tertiary _dr timings as low as possible since my RAM doesn't use them. I'll also try tightening the tertiaries further, but I think they won't post.
There's also of course tRFC and tREFI, but I'll leave that for after my RAM fan arrives.
I already tried tweaking the RTLs and IOLs, but they pretty much refuse to budge. I'll try again later.

[snip]

On a side note, what exactly happens when tRFC/tREFI becomes unstable? Does it corrupt the OS?
I find it interesting that you were able to set tCKE to 0 despite having no tXP or PPD setting visible in BIOS. I stumbled across this reddit thread here that had some people talking about PPD being added as a setting in top end Z490 motherboard beta bioses, but I can't seem to find what memory settings are exposed for your Asus Prime Z390A's BIOS online. Some people were able to disable it by using memtweakit with Asus' "Realtime Memory Tweaking" setting enabled, but I haven't had any luck myself with my ASRock Z170 OCF. My setting for tCKE cannot be set below 5 and my tXP setting cannot be dropped below 4 (no PPD setting exposed that I can find); hence my curiosity.

If Precharge Power Down was disabled, both tCKE and tXP become non-functional from my understanding. It might be worth checking AIDA64's latency test to see if changing tCKE/tXP in BIOS or with memtweakit has any effect; otherwise it is hard to tell if PPD is already disabled or the motherboard is silently overriding tCKE. Motherboards apparently do this for other timings like tRAS as well (although performance impact is unknown AFAIK).

As for your question about tRFC/tREFI, see: Memory refresh - Wikipedia. tRFC = time allowed for the voltage to be recovered in the cell; tREFI = the time a cell can be accessible for before a refresh is performed. If tRFC is too low, the cell's voltage might not reach the threshold to count as a logical 1 (or insufficient margin for the designated tREFI) during recharging. If tREFI is too high, the voltage of the cell over time will drop below the threshold to count as a logical 1 (becomes more important when lowering tRFC). The memory cell is inherently a capacitor and its leakage current (the reason the voltage drops over time) will increase with temperature; thus the higher the temperature, the looser you must keep these timings to compensate.

Now a single bit flip from 1 to 0 might not sound too bad, but it is completely unrecoverable unless detected by some form of error correction. A single corrupted bit happening in sensitive data like an encryption key (presumably) would wreak havoc while millions of bits could be corrupted with no observable effect if the data is considered redundant and was getting overwritten anyway. It is impossible to tell how much damage data corruption can cause until it has already happened; but it really isn't worth the risk. Backup your data and do your best to ensure memory stability before you start using it as a daily driver to limit the likelihood of permanent data loss.
 
1 - 20 of 46 Posts