Overclock.net banner
Status
Not open for further replies.

Strictly technical: Matisse (Not really)

332K views 678 replies 113 participants last post by  ver_21 
#1 · (Edited)
07/08/2019 6:33 PM (GMT) - Update on the bios issue on Crosshair VIII Hero motherboard ("the thing").

Earlier today I received a response to my inquiries from ASUS. The response was rather technical and I cannot go into the specifics of what exactly it involved.
However, it confirmed my suspicions of what actually has caused the seen anomalies. A long story short; a mistake has been made and it has affected the results of multiple reviewers, including my own. In my own case, I ended
up discarding my own affected multithreaded results alltogether, before even releasing them. I'm still angry because of a lot of my own and other peoples work went to waste because of it. But like I said, mistakes do happen.
In this case all of the evidence and known facts suggest that this was indeed a mistake, caused by an extremely tight schedule and miscommunication between several different parties. Infact, all of the facts I can personally verify
indicate that despite the rather suspicious way this mistake happened, there never was any malicous intent involved.

ASUS also provided me a new bios versions for both Crosshair VIII Hero and Formula boards, which correct the mistake made in newer than the AMD approved 0066 bios builds.
Based on my own testing done on the 3900X SKU, the CPU now meets its specification in terms of the allowed power consumption (same way, as the approved 0066 build did). The new build has currently not been validated, so
it will take some time until its changes get reflected to builds available to the larger audience.

What kind of effects will the fixed bioses have then?

Based on my own testing (do note that silicon variation exists and that the sample size is one for 3900X):

- ~ 27W lower average power package power consumption (VDDCR_CPU & VDDCR_SoC, i.e. the main power rails)
- 7°C lower temperature (tDie, while using DeepCool Assassin II cooler)
- < 90MHz average frequency loss across all twelve cores in MT workloads

The above figures were recorded during Blender 2.80b runs, but they should translate almost directly to Cinebench R20 NT as well (based on my experience).

The peak power difference between the faulty and the fixed bioses is around 35W (Prime95).

Despite there is no question that a mistake was made, I'd still like to thank ASUS for two specific reasons: they didn't try to deny the existence of the issue (which btw. is the usual reponse within the industry), but also fixed it immediately.
I also do feel bad for the bios engineer, who had to stay over(over)-time to get the bios build done. Thanks for that. I also have to feel bad for ASUS, because this mistake might have smirched the reputation of their brand new Crosshair VIII -series motherboards.
And make no mistake, these are one of the best, if not the best X570 boards available at the market (a personal opinion).

At this point you should ask yourself if ASUS paid me off?
Everyone can be bought, its just the matter of the offered sum or bargain. Everyone claiming otherwise either lives in self-deception or frankly, is a moron.
I myself could definitely be bought. And rather cheaply too, I think. The thing is, just that at least until writing this, nobody has even tried to do so.

Besides of this statement, I also corrected an error AMD pointed out to me.
Despite the 3900X CPU has fused (factory programmed) Fmax ceiling of 4.65GHz, AMD only advertises 4.60GHz maximum boost.
I must admit that I was initially surprised to see the 3900X having 4.65GHz fused maximum boost limit, since AMD indeed only mentions 4.60GHz in their marketing materials.
Nevertheless, I'm yet to reach the advertised 4.6GHz either, so in that regard the only thing which changes is the CPU falling 25MHz short instead of 75MHz short of its advertised frequency.

-------------------------------------------------------------------

First and foremost, a word of warning. When reading ANY of the AMD Ryzen 3000-series "Matisse" launch-day reviews, the first thing you should do is navigate to the page which lists the hardware setups.
AMD supplied four different motherboards to the media, one from ASRock, ASUS, GIGABYTE and MSI. In case of the ASUS Crosshair VIII Hero Wi-Fi motherboard, the media was instructed to use 0066 bios build,
which had been vetted and approved by AMD. However, newer bios builds were available and ASUS has also (allegedly) told the media to use those versions. What exactly has transpired here is still under investigation,
but regardless of the actual reasons behind it, the consequences might be rather significant. In practical terms, all reviews which were done on ASUS Crosshair VIII Formula or Hero motherboards using other than 0066 bios build must
be considered invalid, at least partially. Reviews using other ASUS motherboard models (not provided by AMD) are under suspicion as well.


Few days ago, I noticed certain anomalies, while measuring the power consumption of the different Matisse SKUs. Inspection of the power management parameters revealed no issues, which could have explained those anomalies.
The external power measurements (VRM DCR) revealed that the CPU was consuming significantly more power, than its power management should have allowed it to. I initially suspected that this was AMDs own doing, in an effort trying
to boost the performance of the new CPUs even further, but further investigation indicated otherwise.

AMD had no part in it, and the actions by ASUS are the sole reason behind it. The investigation revealed that ASUS is altering one or more power
management parameters of the CPU, causing it believe it consumes less power than it actually does. As a result, the frequencies will be higher than the actual power budget would normally allow to. Tricks like this are pretty much a common (mal)practice these
days however, there is a good reason why this must be considered worse than the others: this "thing" is completely undetectable without external measurements and rather deep knowledge, but also there is no way to disable it either.
Even a person such as myself, who can control most things on these platforms cannot disable this "thing". As you may notice, at the moment I call this issue the "thing", since I'm giving ASUS the benefit of a doubt.

The release schedule of Ryzen 3000-series CPUs was rather ridiculous to begin with for two reasons. The retail (or PR, production ready) silicon has been available for at least two months, and relatively finished motherboard designs even longer than that.
Yet AMD had decided to enforce EXTREMELY strict control (NLTR, nothing leaves the room) over the silicon samples. I could have had several different X570 motherboard models months ago, but I managed to lay my hands on the first CPUs just three weeks ago (give or take). The actual CPU samples were distributed to the media just six days prior to the launch date.

Due to the extremely tight schedule, I have worked around 16 hours per day, for the last couple of days. There is nothing I hate more in this world than seeing my work being wasted.
This time a substantial part of it was wasted because of something I had no control over. Unless ASUS can clearly prove that this "thing" happened due to a human error and wasn't intentional, I have to reconsider my relations with them.
Mistakes do happen, but regardless of the actual reasons behind it, it definitely shouldn't have happened.

Despite AMD instructed the media to use the approved 0066 bios build with Crosshair VIII Hero, at the moment I have no idea how many of the reviewers ended up following those instructions and how many thought it would be a good idea to use the latest build (which in case of a new platform, most often is). Potentially this "thing" might have caused significant financial losses as well, in terms of additional salaries required to get the products re-tested with proper settings.

So then, what is affected? Technically every scenario on every Ryzen 3000-series SKU, which might be power limited. Purely single threaded workloads are fine, as well as at least most of the pure gaming tests.
However, every multithreaded CPU workload / benchmark must be considered invalid, if ASUS Crosshair VIII Hero with any other than 0066 bios version was used as the platform.

I used Crosshair VIII Formula for my tests, and since this model wasn't supplied to the media by AMD, there was no "official" (i.e vetted and approved) bios build for it either.
In my case I ended up discarding all of my multithreaded results. Since the Ryzen 3000-series multithreaded results were invalid, there was no point in keeping the multithreaded results for the other platforms either.
Since single threaded workloads are never power limited, these results were fine. In case of testing the SMT-yield on different architectures, the power limits were disabled anyway to avoid any potential biasing, so these results are included as well.

I originally intended to provide a lot more, but unfortunately the reality is that there was never enough time to do it all. The various different issues on several platforms and the "thing" (which was confirmed only yesterday) didn't help things either.
Also the issues with AGESA cross-compability also prevented testing the SMT-yield on Pinnacle Ridge. Because of that, I only provide the figures for Matisse, Coffee Lake Refresh and Skylake-X.

Test setups

AMD Ryzen 7 2700X (IPC / SMT = 3.800GHz fixed)
ASUS ROG Crosshair VIII Formula (Bios 0605, µCode 0x0800820D 4/16/2019, SMU 43.22)
2x16GB Corsair LPX 3333C16 running at 2666MHz 12-12-12-28 (IPC)
Deepcool Assassin II cooler
Windows 10 Education 10.0.18362.175

AMD Ryzen 7 3700X (IPC / SMT = 3.800GHz fixed, SKU-SKU ST up to 4.40GHz)
ASUS ROG Crosshair VIII Formula (Bios 0605, µCode 0x08701013 6/11/2019, SMU 46.37)
2x16GB Corsair LPX 3333C16 running at 2666MHz 12-12-12-28 (IPC & SMT), 3200MHz 14-14-14-32, 1:1 FCLK (SKU-SKU ST)
Deepcool Assassin II cooler
Windows 10 Education 10.0.18362.175

AMD Ryzen 7 3900X (SKU-SKU ST up to 4.6GHz)
ASUS ROG Crosshair VIII Formula (Bios 0605, µCode 0x08701013 6/11/2019, SMU 46.37)
2x16GB Corsair LPX 3333C16 running at 2666MHz 12-12-12-28 (IPC & SMT), 3200MHz 14-14-14-32, 1:1 FCLK (SKU-SKU ST)
Deepcool Assassin II cooler
Windows 10 Education 10.0.18362.175

i9-9900K (IPC / SMT = 3.800GHz fixed, SKU-SKU ST = 1C = 5.00GHz, 2C = 5.00GHz as defined by the fuses), ring offset -3.
ASUS ROG Strix Z390-E Gaming (Bios 1005, modified with µCode 0xBE 5/17/2019 includes all available mitigations, ME FW 12.0.40.1433)
All scenarios: 2x16GB Corsair LPX 3333C16, running at 2666MHz 12-12-12-28
Deepcool Assassin II cooler
Windows 10 Education 10.0.18362.175

i9-9920X (IPC / SMT = 3.800GHz fixed, SKU-SKU ST = 1-2C = 4.5GHz (TBM, SSE), 1-2C = 3.9GHz (AVX2), 1-2C = 3.7GHz (AVX512) as defined by the fuses), mesh 2.4GHz.
ASUS ROG Rampage VI Apex (Bios 1705, modified with µCode 0x2000005E 4/2/2019 includes all available mitigations, ME FW 11.11.65.1590)
All scenarios: 4x16GB Corsair LPX 3333C16, running at 2666MHz 12-12-12-28
Deepcool Assassin II cooler
Windows 10 Education 10.0.18362.175

The IPC

For the first time in over a decade, AMD has reached IPC parity with Intel.
On average, based on the results of 32 individual workloads Zen 2 even manages to provide slightly higher average IPC than Coffee Lake-S Refresh.
Thanks to it AVX-512 resources Skylake-X manages to stay a head in this test suite however, not by a large margin.


Individual results: https://imgur.com/a/AonND9l

NOTE: The gallery link has been updated on 7/9/2019 due to a following reason: In case of the tonemapping test, I've misunderstood the actual performance restrictions of the chain.
The original title of the tonemap chart was "ZIMG 2.91" however, the author pointed out to me that ZIMG itself is not the bottleneck in this case. Therefore the title of the chart has been changed from ZIMG 2.91 to FFMpeg 4.14.
The results (in any regard) are unchanged. The original and mislabeled gallery can be seen here: https://imgur.com/a/LeuwqnD for reference purposes only.

"ER" (Extremities removed):

Pinnacle Ridge - Coffee Lake SR = Particle Force (Hi), Vampire Numbers (Lo)
Pinnacle Ridge - Skylake-X = Linpack (Hi), Vampire Numbers (Lo)
Pinnacle Ridge - Matisse = Particle Force (Hi), Vampire Numbers (Lo)

The SMT-yield


Individual results: https://imgur.com/a/bUgp153

SKU vs. SKU results



Individual results: https://imgur.com/a/y4HAZPF

NOTE: The gallery link has been updated on 7/9/2019 due to a following reason: In case of the tonemapping test, I've misunderstood the actual performance restrictions of the chain.
The original title of the tonemap chart was "ZIMG 2.91" however, the author pointed out to me that ZIMG itself is not the bottleneck in this case. Therefore the title of the chart has been changed from ZIMG 2.91 to FFMpeg 4.14.
The results (in any regard) are unchanged. The original and mislabeled gallery can be seen here: https://imgur.com/a/otWpc5H for reference purposes only.

"ER" (Extremities removed):

3700X-9900K = Stockfish (Hi), Vampire Numbers (Lo)
3700X-9920X = Linpack (Hi), Eigen (Lo)
3700X-3900X = Vampire Numbers (Hi), Lame (Lo)

A word regarding the "Auto Overclocking" feature...

The new "auto overclocking" feature, which is advertised with up to 200MHz frequency increase, in reality does close to nothing, at least on higher-end SKUs.
The lower-end SKUs, such as Ryzen 5 3600 definitely get some advantage however, the higher-end SKUs such as the 3700X and 3900X can be completely maxed out simply by increasing or removing the power limit (through PBO).
These SKUs are already clocked so high that further frequency improvements theoretically made possible by the "Auto OC" feature are disallowed by the silicon fitness monitoring feature (FIT), due to the required voltage for higher frequencies being too high. For instance,
on the 3700X test sample the best core of the CPU raises its frequency by 25MHz when the highest 200MHz option is selected. The rest of the seven cores remain at their default frequency, which varies between 4.35GHz and 4.375GHz.
Meanwhile the 3900X, which has stock max boost of 4.6GHz, there are no gains what so ever. In fact, none of the cores within this CPU even reach the advertised 4.6GHz. The two best cores reach 4.575GHz, while the ten other cores reach 4.325 - 4.4GHz peak. The variation between the different cores even on the same piece of a silicon appears to be huge, which would indicate that the process isn't very mature at this point. Even AMD themselves state in their slides that the frequencies are limited by the voltage they can safely feed to the CPU.

The overclocking capabilities

Essentially, if we're talking about the higher-end SKUs, there is basically none.
Based on my experience, the best case of scenario on 6C CCDs (3600, 3600X and 3900X) is around 4.25GHz, at relatively safe voltage levels.
In case of 3900X, given that you can cool the chip with two of those 6C CCDs. SKUs with 8C CCDs (3700X, 3800X and 3950X) the best case is around 4.15GHz. The 3950X is expected to be thermally limited, as a whole.
The biggest limit is the intensity (heat per area), secondly the voltage you can safely feed to the silicon. For example, the 9900K which has a reputation of being an inferno, has theoretical intensity of ~1.15W/mm² when operating at 5.0GHz (200W @ 174mm²).
Meanwhile Matisse can easily reach intensity of > 1.5W/mm² (120W+ @ 74mm²). The second issue is, that beyond ~3.8GHz the V/F curve becomes extremely steep. According to FIT, the safe voltage levels for the silicon are around 1.325V in high-current loads
and up to 1.47V in low-current loads (i.e ST), depending on the silicon characteristics. Because the stock boost operation is already limited by the silicon voltage reliability, the only way to eke out every last bit of all-core performance is using OC-Mode. Like on previous Ryzen generations, entering OC-Mode also means that you will loose the turbo boost (all cores operate at same frequency). On the higher-end SKUs, the single threaded performance penalty will be massive from doing so. For example on 3900X, you'd be trading additional ~100MHz all-core frequency to a loss of up to 450MHz in ST frequency by doing so. Personally, I advice against overclocking the higher-end SKUs at all, and instead increasing the power limits and trying your luck with the "Auto OC" feature (which most likely isn't beneficial).

The V/F testing was done using full resource utilization (FRU), meaning the stability was tested using 256-bit workloads.
Unlike Intel designs, Matisse does not feature an offset for 256-bit workloads. This means that to ensure the stability of the CPU cores in every scenario, they must be tested using this kind of a workload.
On Matisse, the delta in power consumption between the scalar and 256-bit vector instructions is massive, as expected (37%). That being said, there seems to be other design related factors limiting the maximum achievable frequency.
Despite significantly lower power consumption and therefore also lower temperatures, stability even in pure scalar workloads could not be achieved at much higher frequencies, compare to FRU scenario.




Performance per Watt

As expected, Matisse provides significantly higher performance per watt than its competition, thanks to its leading edge 7nm manufacturing process. Some of you might notice that Matisse's power efficiency seems to peak at 3.5GHz, despite the fact that semiconductors do not behave like that. The reason behind this was revealed by Vmin testing, which clearly illustrated that Matisse lacks fused V/F (voltage-frequency) curve below 3.4GHz. This means that below 3.4GHz frequencies the voltage is always at the same level, it is at 3.4GHz. The stock (fused) V/F curve appears to be extremely well optimized as well, leaving only the temperature factor on the table.



 
See less See more
7
#4 · (Edited)
Awesome writeup and interesting analysis on the overclocking.

Is there a chance we could see some CCD latency testing and compare how data hopping from a core on one chiplet to the other compares to a single chiplet processor? Ie how a theoretical 2CCX/2CCX + 2CCX/2CCX 8 core processor would fare against the 3700X's 4CCX/4CCX? (I know this is probably outside the scope of your analysis but it's worth a try since Matisse does change the layout entirely)
 
#6 ·
Awesome writeup and interesting analysis on the overclocking.

Is there a chance we could see some CCD latency testing and compare how data hopping from a core on one chiplet to the other compares to a single chiplet processor? Ie how a theoretical 2CCD/2CCD + 2CCD/2CCD 8 core processor would fare against the 3700X's 4CCD/4CCD? (I know this is probably outside the scope of your analysis but it's worth a try since Matisse does change the layout entirely)
I can try testing the latencies tomorrow, I should have a suitable tool somewhere.
 
#53 ·
Which reviewer? Do you have a link?
 
#10 ·
Great review, thank you.


As already said, this post provides more for us more relevant info than the most of the youtube / standard press.


I wonder if I will need to get custom loop for planned 3900X/3950X, or if I can safely work with stock cooler, since as you and many other reviewers wrote, OC almost provides no benefit. And paying 400+$ for custom loop seems to be to steep price for 'bout 50MHz boost over stock cooler.
 
#12 ·
#15 ·
Thanks Stilt, invaluable information. Looks like in example 3800X will be hard to launch in any numbers as the specs are too optimistic out of the box. 3950X even more so, but it is at least officially delayed in launch.

I think the only star in this launch is the 3600, great price/performance!
 
#19 ·
@The Stilt , interestingly a friend of mine who's been playing around with a 3600 and a couple x570 mobo's found that you really need to tune to get the IF speed to be half the memory speed and it makes a big difference in latency, read and write speed for the memory and assist with overall performance.

Sent from my SM-G960F using Tapatalk
 
#20 ·
I'm not understanding this. Isn't that is what we are trying to avoid by staying at or up to 3733MHz or just use 3600MHz and lower?
 
#21 ·
Currently I have a pre-order for R7 3700X, after reading OP, I'm thinking of getting a R5 3600. Especially as all I do is tinker with the setup than really use it. OP would suggest to be the non X R5 3600 has full PBO support or am I reading between the lines, anyone able to share info on this? or do I just order a R5 3600 which can be here tomorrow and try it :D .
 
#24 ·
Yes, so we can read your findings soon! Then, sell it when the 16 core cores. :)
 
#23 ·
the safe voltage levels for the silicon are around 1.325V in high-current loads
and up to 1.47V in low-current loads
Really, take that advise for granted whenever your going to put in a manual OC. I've seen people degrade their 2x00 series chips to bits and even review websites sending out the wrong signal for putting 1.4 / 1.45v vcore. It will degrade your chip faster then you think.

Just as (my) 2700x; undervolting and good cooling is what gets the best out of XFR/PBO (4.1Ghz All core and 4.35Ghz single).
 
#28 · (Edited)
@finalheaven

Nice find/share :) (+rep).

@rdr09

I told the wife it was you who made me do it :p ...


Now to get ready to be further hen pecked when I tinker tomorrow ...
 
#40 ·
#31 ·
Regarding the die topology latencies on Matisse, here's what I measured on 3900X at stock.


The intra-CCX (within the same CCX) latency is static, at roughly < 32ns, while the inter-CCX (CCX to CCX) latency and inter-CCD (CCD to CCD) latencies depend on the fabric frequency (FCLK).
Which makes sense, since the CCXs on the same CCDs are connected together through the fabric, as are the different CCDs (through IO-die).
 
#32 ·
Are you seeing the same IF frequency cap at 1800MHz as pointed by some other reviews? Could this be a firmware limitation?
 
#33 ·
1800MHz+ should be doable on all CPU specimens.
I've done 1866 and IIRC 1900MHz as well without any issues.

However, the maximum achievable FCLK depends on the memory configuration as well.
1 DPC SR B-die (i.e. single sided sticks) will be hitting the peaks, and 1 DPC DR B-die will reach 1800MHz and not much higher.
According to some sources with 2 DPC DR sticks you are looking at 1600MHz max. FCLK (haven't tested myself).

Also, 2 CCD parts (3900X and 3950X) should be more picky about FCLK, due to the always present discrepancy in the CCD signaling.
Typically one CCD will prefer lower voltage than the other and so on.
 
#36 ·
Matisse introduced a new voltage adjusment, called cLDO_VDDG. VDDG is the fabric voltage.
At default it is 0.950V however, some motherboards might increase above the default level even at stock settings.

cLDO means the voltage uses a drop-out (LDO = low drop-out) regulator.
Most cLDO voltages are regulated from the two main power rails of the CPU. In case of cLDO_VDDG and cLDO_VDDP, they are regulated from the VDDCR_SoC plane.
Because of this, there are couple rules. For example, if you set the VDDG to 1.100V, while your actual SoC voltage under load is 1.05V the VDDG will stay roughly at 1.01V max.
Likewise if you have VDDG set to 1.100V and start increasing the SoC voltage, your VDDG will raise as well. I don't have the exact figure, but you can assume that the minimum drop-out voltage (Vin-Vout) is around 40mV.
Meaning you ACTUAL SoC voltage has to be at least by this much higher, than the requested VDDG for it to take effect as it is requested.

Adjusting the SoC voltage alone, unlike on previous gen. parts doesn't do much if anything at all.
The default value is fixed 1.100V and AMD recommends keeping it at that level. Increasing the VDDG helps with the fabric overclocking in certain scenarios, but not always.
1800MHz FCLK should be doable at the default 0.9500V value and for pushing the limits it might be beneficial to increase it to =< 1.05V (1.100 - 1.125V SoC, depending on the load-line).
 
#42 ·
I'm most interested to see if we can expect 3600MHz at 14-14-14-14 instead of the 3600MHz at 14-15-14-14 many of us have to run on our Ryzen 2000 CPUs.
 
#62 ·
this is still dropping my jaw, with my sad 3533cl16 b-die kit
 
#43 ·
I did a quick test on the 3900X, using the brand new AGESA 1.0.0.3 w/ A & B patches and the newest chipset-drivers:

Core Fmax listing prior these changes:

Code:
Core 1 =  4575 - 4300MHz - 1.48750V (2*)
Core 2 =  4475 - 4350MHz - 1.48750V
Core 3 =  4525 - 4225MHz - 1.47500V
Core 4 =  4575 - 4275MHz - 1.47500V (1*)
Core 5 =  4475 - 4300MHz - 1.48750V
Core 6 =  4475 - 4300MHz - 1.48750V
Core 7 =  4375 - 4300MHz - 1.46250V (1*)
Core 8 =  4350 - 4300MHz - 1.46250V
Core 9 =  4350 - 4300MHz - 1.46250V
Core 10 = 4325 - 4275MHz - 1.48125V
Core 11 = 4325 - 4300MHz - 1.47500V
Core 12 = 4400 - 4300MHz - 1.47500V (2*)
Core Fmax listing after the changes: See above...

1* = The best core of the CCD, 2* the second best core of the CCD.

I'd say unless the factory V/F calibration is out of whack, which I really don't think it is, the only way AMD is going to increase the frequencies of the CPUs is most likely
allowing them to run at higher voltages (i.e. sacrificing reliability). The voltage figure seen for each core is the voltage the CPU asks from the VRM controller through SVI2 interface, when the specific core is stressed.
 
Status
Not open for further replies.
You have insufficient privileges to reply here.
Top