Overclock.net banner
13641 - 13660 of 14146 Posts

·
Overclock the World
Joined
·
3,283 Posts

·
Premium Member
Joined
·
6,445 Posts
?
Registry patch, from L1, L2 L3 visible values
First = 512
Secondary = 4096
Third = 32768
I edited my registry in Windows 10 for my 5950x with its L1, L2 and L3 cache sizes and I find it helps AIDA64, even latency a bit.

I'm waiting for the fix patches in a week or two before moving to Windows 11 though.
 

·
Overclock the World
Joined
·
3,283 Posts
I'm waiting for the fix patches in a week or two before moving to Windows 11 though.
Am waiting for a confirmation before moving myself over again
Have to reinstall anyways here :)
Then need to see how to configure opt-offl for Win11 ProWS
 

·
Iconoclast
Joined
·
31,568 Posts
I'm highly doubtful those registry entries are doing anything.

SecondLevelDataCache is something that started cropping up as a tuning recommendation in the 1990s and MS told people to stop using it about 15 years ago because it only works if the HAL can't detect cache size on parts with direct mapped caches, which haven't been used by any x86 architecture after the original P6/Pentium Pro, IIRC. I'm not sure any version of Windows ever even parsed those L1 and L3 entries and I'm pretty sure Windows stopped parsing "SecondLevelDataCache" after Windows XP.

Cache should be totally transparent to the OS.


 

·
Premium Member
Joined
·
6,445 Posts
I'm highly doubtful those registry entries are doing anything.

SecondLevelDataCache is something that started cropping up as a tuning recommendation in the 1990s and MS told people to stop using it about 15 years ago because it only works if the HAL can't detect cache size on parts with direct mapped caches, which haven't been used by any x86 architecture after the original P6/Pentium Pro, IIRC. I'm not sure any version of Windows ever even parsed those L1 and L3 entries and I'm pretty sure Windows stopped parsing "SecondLevelDataCache" after Windows XP.

Cache should be totally transparent to the OS.


If "the system attempts to retrieve the L2 cache size from the Hardware Abstraction Layer (HAL) for the platform. If it fails, it uses a default L2 cache size of 256 KB." is a bug with Windows and HAL then maybe manually setting it is why it helps?
 

·
Overclock the World
Joined
·
3,283 Posts
Waiting for @LxT1N to confirm whereever this does anything at all & show which version he/she runs
Funnily, if you put load on the OS , then cache behaves correctly.
Something appears to hardthrottle it.
Could be resolved with current chipset updates, could all be nonsense.
On Win 10 i saw nothing, sadly

Waiting for a reply
 

·
Premium Member
Joined
·
6,445 Posts
Waiting for @LxT1N to confirm whereever this does anything at all & show which version he/she runs
Funnily, if you put load on the OS , then cache behaves correctly.
Something appears to hardthrottle it.
Could be resolved with current chipset updates, could all be nonsense.
On Win 10 i saw nothing, sadly

Waiting for a reply
Microsoft is releasing an update soon which is supposed to fix the L3 cache issue and soon after AMD is releasing a chipset driver which fixes the preferred cores problem.
 

·
Iconoclast
Joined
·
31,568 Posts
If "the system attempts to retrieve the L2 cache size from the Hardware Abstraction Layer (HAL) for the platform. If it fails, it uses a default L2 cache size of 256 KB." is a bug with Windows and HAL then maybe manually setting it is why it helps?
As Microsoft has been stating for decades, this entry should only be relevant for direct-mapped caches. The HAL is also almost certainly not failing to identify the capabilities of processors where the correct driver has been loaded and the cache size is being listed accurately in Task Manager.

Even if, by chance the L2 size wasn't being detected and Windows wasn't managing memory in a way that allowed optimal utilization, the impact should be minimal-to-nonexistent on the 8-way associative L2 current AMD parts have. I can't completely rule out the SecondLevelDataCache setting having an effect on some setups, but I'd need to see benchmarks that couldn't be put down to scheduling, turbo, or memory training variations. I'm also doubtful that combining L2 would even be the correct way to set this variable, even if it were being parsed and was actually influencing the memory manager.

Regardless, setting L1 & L3 cache sizes wouldn't make any sense. On this architecture, the L2 is the largest layer of the inclusive caches, so anything above or below it in the hierarchy is wholly transparent to the OS memory manager irrespective of any associativity. Perforce, everything in the L1 is in the L2, and the L3 is only filled with L2 evictions. Even if it's possible to specify these cache sizes via the registry--and I do not believe it is, because I cannot find any mention of those entries in any official or semi-offical source--they shouldn't do anything.
 

·
Registered
Joined
·
823 Posts
Super busy the past day but just quickly stopping by to post this First Windows 11 Patch Tuesday Makes Ryzen L3 Cache Latency Worse, AMD Puts Out Fix Dates as lol @ MS/Windows 11.

In our own testing, a Ryzen 7 2700X "Pinnacle Ridge" processor, which typically posts an L3 cache latency of 10 ns, was tested to show a latency of 17 ns. This was made much worse with the October 12 "patch Tuesday" update, driving up the latency to 31.9 ns.
Don't mind us, we'll just make things... even worse, for the next week or so.
 

·
Premium Member
Joined
·
6,445 Posts
Waiting for @LxT1N to confirm whereever this does anything at all & show which version he/she runs
Funnily, if you put load on the OS , then cache behaves correctly.
Something appears to hardthrottle it.
Could be resolved with current chipset updates, could all be nonsense.
On Win 10 i saw nothing, sadly

Waiting for a reply
 

·
Registered
Joined
·
8 Posts
there is a parabola dependence
ns in ticks, round up to a multiple of 8 and +8.

For 16-Gbit chips (micron,hynix,samsung)
ns (table Reous) in ticks, round up to a multiple of 16 and +16.
@anta777 I'm currently running the following:
Rectangle Azure Font Screenshot Technology


do you recommend switching to tRFC 576 for 16Gb micron ICs? What about tRFC2 and 4? Should I run the usual tRFC/35*26 and tRFC/35*16?
 

·
Overclock the World
Joined
·
3,283 Posts
Yes yes
But the test for this reg patch aside

"Just a bug by Microsoft"
A bug that intentionally does recognize CPPC tags and intentionally throttles cores down till load is applied
A bug that intentionally does more work instead of resulting in just "slower" cache but not sleeping cache 🤭
I am waiting for this too - but want to know more about this thing you posted. It did nothing for me on Win 10 and shouldn't do anything
Question is, why does it remain in registry. MS is known to clean their stuff at least once a decade. Win95 has been more than 1 & 1/2 decades :D
======================================
I gave Anta's Team, config now neutral 2 weeks - likely more. Neutral testing on anything that is possible
I think i'll abandon it permanently, nearly sure but not entirely. Want to give it slightly bit more time ~ to be 100% confident not to use it

The config guaranteed breaks beyond 1.6vDIMM.
It makes now sense why it was warned not to run it over 1.56v
It has no connection to the timings used and i wasn't so far capable to find any powering combination making it function (i really tried to make it work ~somehow~)
With all published fixes, % testing time calculations or cycle extensions. It guaranteed breaks on the start of the 2nd cycle. It doesn't matter if on cycle takes 28min, or 60min. The design is the fault

I think everything Anta777 wrote and learning by Serj about it's design ~ remains SerJ's design and has truth
But it is very clear to me now ~ that both configs differentiate in the methods they function.
It wouldn't matter how "well" they function, but they are a completely different design and not comparable
For a config that i can tell no working difference & working-type difference ~ to GSAT, HCI or Karhu. From something i can not tell apparat by all other programs & that strains the CPU too much ~ being able to error by the CPU instead by the RAM.
I see no reason to stick with it

MemOC is frustrating that way, when it can be everything and nothing at the same time.
Why should i run this vs GSAT for example. At least on GSAT there are no potential voltage testing issues.
Anywho, it's too early to write critique. Just want to sneak peek say ~ that the plans nearly are made. Unless i magically find a reason why i should prefer it

So far, it looks like a program that tests discharge with quite high strain and has nearly no connection to anything timing related. (learning when it errors and when not)
This is the conclusion i came to, after 20 different types of attempts getting along with this config and methodic of testing. Zero , well 1
If i pull voltage back, anything i trow at it passes. (even on stock stock CPU freq). But it is too harsh for weak PCBs ~ to be a timings only test. I see by such results & behavior no reason to even bother with it.
It doesn't find errors "faster" . It doesn't show the reason for errors
(yes that was how SerJ educated, but 1usmus config clearly behaves different ~ undoubtfully & consistent)
It doesn't stress the CPU less than any other LinX , MemtestPro, HCI, GSAT test out there (which are suboptimal for testing RAM timings but rather test whole fabric stability ~ soo errors can be influenced by such too)
Sadly i can not see neutrally why i should bother with it. I can not see any usecase ~ but i'll give it slightly bit more attempts, purely out of respect for the work.

Later Arshia + Anta config will be tested, maaybe that one is not broken when it comes to VDIMM amount ~ maaybe it is different in the testing methodology, we'll see.
And maybe Anta's personal configs are fine - but this version here, is broken for high VDIMM. Now i understand the reason for the warning ~ yet not the reason for the clear issue.
* Oh if all goes according to plan, i'll have a 5900X for a week to test with the 2x16GB 4000C16-16 dimms. I think it is now time (next week):)
 
  • Rep+
Reactions: Mach3.2

·
Registered
Joined
·
682 Posts
After 10 months of trying to get my 4x16GB 3200 CL14 kit to OC as high as possible with stability, this is where I have landed on. I know of others with similar 3200 CL14 kits running at 3733, but for me, this is my ceiling. Due to my ceiling of 3600, I've been focusing on tightening subtimings instead.

But nothing above 3600 will POST, not even at CL16 or CL18, with vSOC up to 1.2V and VDIMM up to 1.5V.

So reaching out to Anta777 and Veii to get any other suggestions that I can do. I been slightly tightening certain sub timings based on the posts here, and even a change of 1 in those subtimings trigger an error in TM5. Am I missing on one of the Rtts or DrvStr which is limiting my ability to hit 3666. This is running on 1.465 VDIMM.

Font Rectangle Material property Handwriting Parallel
 

·
Overclock the World
Joined
·
3,283 Posts
But nothing above 3600 will POST, not even at CL16 or CL18, with vSOC up to 1.2V and VDIMM up to 1.5V.

So reaching out to Anta777 and Veii to get any other suggestions that I can do. I been slightly tightening certain sub timings based on the posts here, and even a change of 1 in those subtimings trigger an error in TM5. Am I missing on one of the Rtts or DrvStr which is limiting my ability to hit 3666. This is running on 1.465 VDIMM.
If you touch anything, you will need to push tRDWR up
My personal math says 9+ , 8 likely only runs as everything is low
tCCD_ we have to depend on AMD's great training algorithm

Anta's recommendation for tWR = tCL gives nice bandwidth numbers - but you have to pair that with tWTR_S 3 or lower
Like 3-8
tRRD_ you should push, that is to my 16-16 and downwards experiments, the key factor allowing me to run 15-15 at all, or 14-14 till 4067MT/s on this "bad" 4000C19-19 kits
Voltage aside, tRRD_ you should slow down.
You don't need fast transition timings and slow primaries. Primaries are more worth. Tertiaries are for later.

Running both SD's higher - means to my experience more bandwidth , but they will overlapp.
Higher capacity needs them at lower values.
tWTR_L does directly affect Read Bandwidth & tWTR_S was copy. tWR was also Read
I can't remember my exact findings, but i've posted it here.

What matters for stability distance between both tRRD_
What matters for bandwidth is each of the values (SCL, WR, tWTR_S & _L
tFAW doesn't matter thaat much , but you will lose Read Bandwidth if tRRD_S is pushed. Yet this is crucial sometimes ~ up to PCB and strain

I've never played with such capacity, to give advices ~ but pretty sure if these are 4x dual rank, you want to run the normal BankGroupSwap & at absolute worst double tWR if needed
tCL = tWR remains a great lesson - while lowering tWR 10, tRTP 5 , can work out ~ matching it with tCL has first stability priority (i can confirm Anta's teaching)
======================
You surely want to push CsOdtDrvStr to at least 30, and see if you can get away with the rest being 20ohm. This is trace length dependent
Such high capacity i expect should run 120-20-20-24, or 60-20-30-20, maybe even 60-20-40-20
Mostly it's not any holes, but just bad training. It still is not fixed

If my predictions would be accurate,
4x16 should need around 40-42ohm, the one above 40ohm procODT
I do feel tho, that 6-3-3 is a bit too . . . weak, for it.
If you still move bellow 1.45v, give 7-3-2 a try.
Else bump to 1.48v and give 6-3-2 or even be stupid (try pls) and see how 5-2-3 behaves. (if it will post at all @ 3400 & higher MT/s) . I fear for such WR_2 might mess things up, but it surely can help you here

IOD shouldn't need much of a push , but what about
950-980-1060-1125mV
VDDP-CCD-IOD-SOC
SOC should at least be 1.1vGET, the rest are SET values
======================
Short advices
tWR 14
tWTR_L 8, if that causes issues drop tWTR_S to 3 and tWTR_L to 8 ~ that has to run then. If that has issues, drop both SD, DD's to 1-4-4-1-6-6 at least or ignore it but give tRDWR +1
 

·
Registered
Joined
·
2,291 Posts
Anyone have any experience with pushing tRP and Y-Cruncher not staying stable ?

So far tRP has not been influenced by increasing vDIMM, vSOC, vDDP etc or by changing Rtt, Cad bus etc values to an extent that Y-Cruncher will act consistently, i.e. sometime it wil crash after several cycles, sometimes in the first cycle etc etc.

Im now trying PLL 1.8v ...

Im pushing tRP @12 with tRFC @228, relaxing tRFC does not bring stability either ...
 

·
Registered
B550 AORUS MASTER, 5800X, 16GB (2×8GB) TEAMGROUP UD4-4000 DDR4 memory, XFX RX 5500 XT
Joined
·
220 Posts
Here's my latest efforts with the 5800X and the TEAMGROUP B-die

Pretty meh so far... I'll keep plugging away

Light Black Font Screenshot Line

Font Screenshot Line Software Technology

Font Screenshot Line Software Technology
 

·
Registered
Joined
·
170 Posts

·
Registered
Joined
·
152 Posts
What do you suggest changing on current timings so I get rid of this error and continue tweaking the rest of the timings further?
XMP was unstable=> tRDWR from 8 to 10, 1.38v VDIMM, (maybe tWTRS from 5 to 6) and RttPark RZQ/2 fixed it.
Later modifications as follow:
tRRDS/L 6,6 tFAW 24 passed, 4,6,16 failed (don't remember the error, maybe 0 with 1usmus) so reverted back to 6,6,24
tWR 16, tRTP 8 failed as shown in screenshot
 

Attachments

13641 - 13660 of 14146 Posts
Top