Overclock.net › Forums › Intel › Intel CPUs › Skylake Overclocking Guide [With Statistics]
New Posts  All Forums:Forum Nav:

Skylake Overclocking Guide [With Statistics] - Page 838

post #8371 of 11364
Ok I will when I finish downloading, optimizing the OS, and make a proper image back up to prevent silent data corruption for my OCing adventures. It runs really cool. Might toy with sub 1.5volts to see if I can get 4.7 actually stable. The cache frequency is also the MEMORY CONTROLLER frequency. Remember how people in the haswell days were saying past 2133mhz ram didnt make hardly any difference? Heres why. Almost no one overclocks the memory controller. Stock 4ghz can saturate 2000mhz dual channel bandwidth, after which is extremely diminishing returns. None of this has changed with DDR4, hence why in skylake they upgraded the memory controller. It can reliably match the core frequency which at a 1:1 ratio is efficient. Its the same reason the PS3 cell processors matched its system ram at 3200mhz. The synchronization is ideal for efficiency per clock. With limited console peasant hardware (and that iteration of cell was actually pretty powerful for what it was despite only having 6 cores to play with) getting the most out of what you had was important. If you do the math, 4.8ghz uncore will saturate 2400mhz ram. As you can imagine the scaling isnt so great with DDR4, but its like trying to fit a aircraft carrier through a garden hose when you underclock the memory controller and add say 3733mhz ram which I have lying around. Dont have a delux form asus to run it sadly. Its bottlenecking hard. In what situations does it matter? Well, not many games use more than 8 threads. The more load the CPU is under, the more obvious it is. the multithreading performance compared to my haswell despite having a 200mhz core deficit obviously spanning all full fledged and logical cores is 12% better in CPU-Z. Thats OC vs OC in the same OS. It has very little to do with the skylake architecture improvements on die, beyond the memory controller and the fact it can actually use the DDR4 bandwidth. Im willing to bet what people benchmark and come to their conclusions doesnt even use all 8 threads, and on top of it, the program is context switching their cache, and using logical cores at near random in place of what should be full on cores. Its why people disable hyperthreading in some games to get more performance. Its because they dont know what process lasso is, and they arent putting the logical cores to work on background OS stuff instead of getting in the way. Professional reviewers with 8 core i7s dont even know this and they run 8 threaded apps. Its like having a v8 but running on 6 cylinders because of ignorance. In crysis 3 (needs timer resolution, and process lasso) I was CPU bound at high resolution. Yes, high resolution the CPU was the limiting factor. My graphics cards with overclocks were sitting around with a thumb up their ass waiting around on the 4.8ghz haswell at 3k. The reason is AMDs DX11 driver is almost single threaded in how it interfaces with the GPU. Its the bottleneck in my build, and its caused by the API. Skylake despite having a pretty significant drop in clock on the core still makes a 6% gain in single threaded performance. You simply cant point to the architecture changes or the ram alone. The memory controller is playing a role in the ram becoming a relevant addition to the equation.

https://www.reddit.com/r/Amd/comments/3sm46y/we_should_really_get_amd_to_multithreaded_their/

I also own a GTX 980M. Its pretty obvious the DX11 driver is using 2 threads to talk to the GPU and its a much weaker CPU. As such I requested a special binned chip to compensate for this, and I can tell you memory controller speed makes a reasonable difference under certain circumstances, recap, fast ram, multithreading, or CPU bottleneck period. Framerates are more consistent in where they dip. I know we read online over and over high resolution is GPU bound territory. Thats because most people dont have extremely powerful graphics cards maxing out the lameness of DX11.
post #8372 of 11364
Quote:
Originally Posted by Lucifer1945 View Post

Stock 4ghz can saturate 2000mhz dual channel bandwidth, after which is extremely diminishing returns. None of this has changed with DDR4, hence why in skylake they upgraded the memory controller. It can reliably match the core frequency which at a 1:1 ratio is efficient. Its the same reason the PS3 cell processors matched its system ram at 3200mhz. The synchronization is ideal for efficiency per clock. With limited console peasant hardware (and that iteration of cell was actually pretty powerful for what it was despite only having 6 cores to play with) getting the most out of what you had was important. If you do the math, 4.8ghz uncore will saturate 2400mhz ram.

But where are you getting these numbers? I don't see how anything about the PS3 is relevant to the Z87/Z97/Z170 platform.
Quote:
Originally Posted by Lucifer1945 View Post

It can reliably match the core frequency which at a 1:1 ratio is efficient.

This just sounds like OCD to me, there's nothing particularly special about having a 1:1 cache/core ratio.
X9969X
(7 items)
 
  
CPUMotherboardGraphicsRAM
6950X @ 4.5GHz (Silicon Lottery 4.4 delid) Asus Rampage V Extreme  Nvidia Titan X Pascal Corsair Dominator Platinum 16GB DDR4 @ 3200MHz 
Hard DrivePowerCase
Intel 750 1.2TB SSD Corsair AX1200i Corsair Obsidian Series 900D 
  hide details  
Reply
X9969X
(7 items)
 
  
CPUMotherboardGraphicsRAM
6950X @ 4.5GHz (Silicon Lottery 4.4 delid) Asus Rampage V Extreme  Nvidia Titan X Pascal Corsair Dominator Platinum 16GB DDR4 @ 3200MHz 
Hard DrivePowerCase
Intel 750 1.2TB SSD Corsair AX1200i Corsair Obsidian Series 900D 
  hide details  
Reply
post #8373 of 11364
"But using hyperthreading makes no sense in the sense that anybody that does chess generally does not benefit from hyperthreading and the inflated score is a misleading score."
-Darkwizzie

Source:
http://www.overclock.net/t/1568025/dglee-retail-skylake-i7-6700k-reviewed-finally/470

You should check out process lasso. Its for servers, but also good for games. Try it out. Alien isolation for example with hyperthreading is slower. Making the change im talking about on my weaker 3.5ghz notebook i7 is noticeable/measurable. Tragically a fixed benchmark isnt in existence to have absolute parity run to run. Thats because the game by default is going to split the load across 8 cores, but if you were to max out the CPU magically, you would see on all 8 threads 50%. Think about that a second.. Logical cores are about 30% the performance of full cores. Thats being generous. Disabling hyperthreading means the game will now not use only full cores instead of swapping between those and logical ones. The fact it cant truly use all of the CPU, let alone efficiently (context switching) means you want to optimize what the game is using. Disabling hyperthreading is inferior to simply using a program to force the game only to full cores and leaving the logical cores to do background stuff. In fact, if you have gay sound peripherals like realtek for example, you can put the drivers for that on logical cores at no hit to performance to computing sound in titles like this. The windows thread scheduler isnt an extremely competent A.I. and you cant count on developers to do a good job with it, though Ive seen this happen on doom with FX processors which share cache, so it makes sense they would automate that as those CPUs perform, ermmmm, not so well. 4 threaded application, so an 8 core FX this makes sense. Means each core has basically more free reign over its cache. Obviously not total, as stuff is being done on other threads not used by game, sharing the cache pool. It has many uses. Now you know.
Edited by Lucifer1945 - 8/9/16 at 7:23pm
post #8374 of 11364
Even linus talked about he magical 1:1 ratio in one of his videos. I would have to dig to find it. Its common sense man. The concept pertains somewhat to g sync monitors or freesync. Synchrony, on key, the cycle clicks on time with one another and it doesnt have to wait for the next refresh to continue working. Obviously there is no "performance" scaling beyond the refresh rate, but its a similar concept, along with ultrapolling with certain peripherals. It ranges from 125, 500, and 1000 because its not updating in perfect sync with the monitor refresh rate. Look at ram timings, its orderly, one part does its thing, then WAITS on another series of things. If its synchronized like ASYNCHRONOUS compute in DX12 on radeon GCNs its more efficient period. Nvidia doesnt truly have this feature. Load balancing preemption is still a serial.. or, one at a time method. They went this way so they didnt have to give up DX11 performance. Pascal is faking it, false advertising, like the GTX 970. Its not OCD, the PS3 was designed the way it was on purpose. So that it got the most out of what it was, like I originally posted. Its not hard to grasp. It was just an example, and why its a good idea to match it when it makes sense, like with skylake where its actually possible without compromising on the core depending on the chip in question. On haswell this was simply impossible. The screenshots of 4.7 uncore unicorn chips, I would love to see how stable they actually were.
Edited by Lucifer1945 - 8/9/16 at 7:14pm
post #8375 of 11364
Quote:
Originally Posted by Lucifer1945 View Post

Even linus talked about he magical 1:1 ratio in one of his videos. I would have to dig to find it. Its common sense man. The concept pertains somewhat to g sync monitors or freesync. Synchrony, on key, the cycle clicks on time with one another and it doesnt have to wait for the next refresh to continue working. Obviously there is no "performance" scaling beyond the refresh rate, but its a similar concept, along with ultrapolling with certain peripherals. It ranges from 125, 500, and 1000 because its not updating in perfect sync with the monitor refresh rate. Look at ram timings, its orderly, one part does its thing, then WAITS on another series of things. If its synchronized like ASYNCHRONOUS compute in DX12 on radeon GCNs its more efficient period. Nvidia doesnt truly have this feature. Load balancing preemption is still a serial.. or, one at a time method. They went this way so they didnt have to give up DX11 performance. Pascal is faking it, false advertising, like the GTX 970. Its not OCD, the PS3 was designed the way it was on purpose. So that it got the most out of what it was, like I originally posted. Its not hard to grasp. It was just an example, and why its a good idea to match it when it makes sense, like with skylake where its actually possible without compromising on the core depending on the chip in question. On haswell this was simply impossible. The screenshots of 4.7 uncore unicorn chips, I would love to see how stable they actually were.


I wouldn't take every word Linus says as gospel. I've caught him out a few times but no one is perfect.

I remember with the NForce 2, if you ran the ram synchronously with your fsb, it would yield better performance than if you ran the ram at faster speeds, but apart from that example I've not come across anything that behaves the same way, including other mobo's from the same generation.

You don't really know how much work each mhz can handle for each part of the chip. Only the engineers do.

I think of it more like a series of pipes. So long as the pipe is wide enough to hand the flow, the mhz figures can be asynchronous.

For instance a memory controller may only need half the mhz to cope with double the workload a CPU can feed it, it depends more on the circuits and chip design inside.
post #8376 of 11364
Yeah. Hes said that when you run out of vram it goes to the hard disk TWICE. Hes blatantly trolling, but what hes saying about the ratio makes sense logically.
post #8377 of 11364
"I think of it more like a series of pipes. So long as the pipe is wide enough to hand the flow, the mhz figures can be asynchronous."

Exactly. When I wake up tomorrow ill set it up and run benchmarks with underclocked uncore, everything else identical in various apps, then stock uncore, then what im aiming for. Wish I had a haswell to do apple to apple comparisons. I could run everything the same clock and to the extent possible, the same latency and bandwidth on the ram if I get creative. Tertiary timings however..... not so much. Would give an idea of what the architecture changes are actually doing, but again, cant run ram exactly the same. The point is its going to be more mickey mouse than the uncore being wildly higher than stock let alone under stock with decent ram in key applications, and ones experiencing the most horrendous bottlenecks. Its obvious to me, why despite 200mhz inferior core frequency skylake wrecks a 4.8ghz haswell with uncore mildly overclocked to its max. Its not just the DDR3 to DDR4 difference, in fact, the latency advantage goes to haswell, but that bandwidth of the DDR4 is essentially worthless if you underclock the memory controller. Might as well run cheap DDR3L on some lame motherboard at that point.

"For instance a memory controller may only need half the mhz to cope with double the workload a CPU can feed it, it depends more on the circuits and chip design inside."

I read somewhere how the non enthusiast line chips function. The quad channel memory controllers im sure handle it differently.
Edited by Lucifer1945 - 8/9/16 at 8:10pm
post #8378 of 11364
Quote:
Originally Posted by Lucifer1945 View Post

"But using hyperthreading makes no sense in the sense that anybody that does chess generally does not benefit from hyperthreading and the inflated score is a misleading score."
-Darkwizzie

Source:
http://www.overclock.net/t/1568025/dglee-retail-skylake-i7-6700k-reviewed-finally/470

You should check out process lasso..

Excellent little tip. Way back when there was a way of forcing cores to bind to particular process's - cant remember how to do it now - and this does similar things and a whole lot more.

Having said that it seems for my main application Windows/The Asus Microcode does as good a job as Lasso because the speed is absolutely the same regardless of what priorities I set in Lasso. But this is not gaming. This is applications which for the most part are single threaded and if Windows/The Micro Code allocate a full core and full priority Lasso cant do any better and that seems to be what happens anyway. Cute utility though.

HH
post #8379 of 11364
What is sleep anywho.... im so dead. Yeah, on single threaded applications its essentially worthless.
post #8380 of 11364
Quote:
Originally Posted by Yuhfhrh View Post

But where are you getting these numbers? I don't see how anything about the PS3 is relevant to the Z87/Z97/Z170 platform.
This just sounds like OCD to me, there's nothing particularly special about having a 1:1 cache/core ratio.

Best to not enable him, some very imaginative posting going on
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Intel CPUs
Overclock.net › Forums › Intel › Intel CPUs › Skylake Overclocking Guide [With Statistics]