Overclock.net › Forums › Industry News › Hardware News › [kitguru] AMD: We have taped out our first FinFET products
New Posts  All Forums:Forum Nav:

[kitguru] AMD: We have taped out our first FinFET products - Page 11

post #101 of 106
Quote:
Originally Posted by Fyrwulf View Post

AMD has already demonstrated an ability to design very high density processors, so I don't know that your logic holds up. Those same processors are also fairly energy efficient. The perfect example is the 7870K, which draws under 100 watts despite having a power hungry (and legitimate, unlike Intel) GPU on die.

Could you please point which of AMD CPU designs are "very high density"?

Stars (10h) Core (64 + 64 + 512) = 22.18mm² (GlobalFoundries 45nm)
Bobcat Core (32 + 32 + 512) = 8.78mm² (TSMC 40nm)
Stars (12h) Core (64 + 64 + 1024) = 16.48mm² (GlobalFoundries 32nm)
Bulldozer / Piledriver CU (32 + 64 + 2048) = 33.09mm² (GlobalFoundries 32nm)
Jaguar / Puma Core (32 + 32 + 512) = 6.50mm² (TSMC / GlobalFoundries 28nm)
Steamroller CU (32 + 96 + 2048) = 29.99mm² (GlobalFoundries 28nm)
Excavator CU (64 + 96 + 1024) = 20.52mm² (GlobalFoundries 28nm)

Nehalem Core (32 + 32 + 256) = 29.45mm² (Intel 45nm)
Westmere Core (32 + 32 + 256) = 17.90mm² (Intel 32nm)
Sandy Bridge Core (32 + 32 + 256) = 16.86mm² (Intel 32nm)
Ivy Bridge Core (32 + 32 + 256) = 12.58mm² (Intel 22nm)
Haswell Core (32 + 32 + 256) = 15.71mm² (Intel 22nm)
Broadwell Core (32 + 32 + 256) = 8.32mm² (Intel 14nm)

* Intel design sizes are based on estimation.

Each AMD compute unit (CU) has two integer units and two 128-bit FMACs.
The 256x8K (2048KB) L2 block takes 12.2mm² on 32nm and 11mm² on 28nm on these parts.

Based on my own measurements, the 7870K APU consumes 61W during Cinebench R15 at default settings. Thats the power consumed by the CPU alone (measured with DCR, current over inductor(s)). On this design the power consumption during Cinebench R15 represents roughly 79% of the maximum power consumption during maximum FPU stress (e.g. Prime95).

The 7870K scores 328pts in Cinebench R15 at stock frequency of 3.9GHz.
Meanwhile, the i5-4460 (2C/4T, 3.2GHz base, 3.4GHz turbo) scores 532pts in the same test. That´s 62.2% more despite the clocks are 12.8% lower. Thats 86% higher IPC*.

* ((532 / 3400) / (328 / 3900))

ps. Before you come and tell me that Cinebench(es) are neutered on AMD CPUs, I can tell you they´re not. I have manually patched the CPU dispatcher from Cinebench myself and doing so doesn´t change the results in either way. Besides, similar performance differences between AMD and Intel CPUs can be observed in open source software (e.g. X265), compiled with open source compilers.
Quote:
Originally Posted by Serios View Post

All I see are exaggerations.
How are people still using Phenom X6 CPUs OCd at 4ghz? or 8350's OCd at 4.7-5.0ghz??
According to your theories these would also be impossible CPus to deal whit.

It's not like Zen is an exclusive laptop CPU or something, it's high power high performance design.
14nm will allow AMD to increase efficiency and performance in comparison to Bulldozer without a doubt. An 8 core Zen CPU should be way more efficient than a 8350 or a Phenom X6 1090T. That is what everybody expects.

People using these CPUs are suffering from the poor performance, and have been for quite a while now. Or why did you think all this rant exists? AMD needs to provide something usable in order to prevent Intel from asking black market prices for their stuff.

This isn´t about if the smaller 14nm node improves efficiency or not.
It´s all about if the process itself is even usable or suitable for a design which AMD needs to put in the market. It´s a low power process, designed for network, mobile and storage chips and somehow everyone expects it to come and rescue a large high power design. Why the hell do people think "High Performance, High Performance Plus and Super High Performance" nodes exist and why do the designers pay premium for them? Besides, Samsung 14nm isn´t even a real 14nm process. It´s real measurement make it closer to 17nm than 14nm.
Edited by The Stilt - 7/24/15 at 5:39am
post #102 of 106
Quote:
Originally Posted by The Stilt View Post

Could you please point which of AMD CPU designs are "very high density"?

Stars (10h) Core (64 + 64 + 512) = 22.18mm² (GlobalFoundries 45nm)
Bobcat Core (32 + 32 + 512) = 8.78mm² (TSMC 40nm)
Stars (12h) Core (64 + 64 + 1024) = 16.48mm² (GlobalFoundries 32nm)
Bulldozer / Piledriver CU (32 + 64 + 2048) = 33.09mm² (GlobalFoundries 32nm)
Jaguar / Puma Core (32 + 32 + 512) = 6.50mm² (TSMC / GlobalFoundries 28nm)
Steamroller CU (32 + 96 + 2048) = 29.99mm² (GlobalFoundries 28nm)
Excavator CU (64 + 96 + 1024) = 20.52mm² (GlobalFoundries 28nm)

Nehalem Core (32 + 32 + 256) = 29.45mm² (Intel 45nm)
Westmere Core (32 + 32 + 256) = 17.90mm² (Intel 32nm)
Sandy Bridge Core (32 + 32 + 256) = 16.86mm² (Intel 32nm)
Ivy Bridge Core (32 + 32 + 256) = 12.58mm² (Intel 22nm)
Haswell Core (32 + 32 + 256) = 15.71mm² (Intel 22nm)
Broadwell Core (32 + 32 + 256) = 8.32mm² (Intel 14nm)

* Intel design sizes are based on estimation.

Each AMD compute unit (CU) has two integer units and two 128-bit FMACs.
The 256x8K (2048KB) L2 block takes 12.2mm² on 32nm and 11mm² on 28nm on these parts.

Based on my own measurements, the 7870K APU consumes 61W during Cinebench R15 at default settings. Thats the power consumed by the CPU alone (measured with DCR, current over inductor(s)). On this design the power consumption during Cinebench R15 represents roughly 79% of the maximum power consumption during maximum FPU stress (e.g. Prime95).

The 7870K scores 328pts in Cinebench R15 at stock frequency of 3.9GHz.
Meanwhile, the i5-4460 (2C/4T, 3.2GHz base, 3.4GHz turbo) scores 532pts in the same test. That´s 62.2% more despite the clocks are 12.8% lower. Thats 86% higher IPC*.

* ((532 / 3400) / (328 / 3900))

ps. Before you come and tell me that Cinebench(es) are neutered on AMD CPUs, I can tell you they´re not. I have manually patched the CPU dispatcher from Cinebench myself and doing so doesn´t change the results in either way. Besides, similar performance differences between AMD and Intel CPUs can be observed in open source software (e.g. X265), compiled with open source compilers.
People using these CPUs are suffering from the poor performance, and have been for quite a while now. Or why did you think all this rant exists? AMD needs to provide something usable in order to prevent Intel from asking black market prices for their stuff.

This isn´t about if the smaller 14nm node improves efficiency or not.
It´s all about if the process itself is even usable or suitable for a design which AMD needs to put in the market. It´s a low power process, designed for network, mobile and storage chips and somehow everyone expects it to come and rescue a large high power design. Why the hell do people think "High Performance, High Performance Plus and Super High Performance" nodes exist and why do the designers pay premium for them? Besides, Samsung 14nm isn´t even a real 14nm process. It´s real measurement make it closer to 17nm than 14nm.

Not totally correct.
Ivy,Sandy and haswell = bigger
http://forums.anandtech.com/showthread.php?t=2294334

I know that 2MB of L cache should take around 10mm^2 on 28nm. So 3.1mm^2 for Jaguar + Lcache ~ 5.6mm^2

And now Cinebech does show you IPC (compilers are not the best for AMD).
IPC = FPU + INTEGER perf of CPU. (yes, I know many other things... but let it make simple)

FPU performance is very low on bulldozers... it is as fast as Jaguars FPU (clock per clock)

If AMD has 86% slower FPU...
- steamroller has around 86% slower FPU.

BUT OCED kaveri exceeds performance of ivy bridge in some benchmarks.
http://blackholetec.com/main/article/amd-a10-7850k-kaveri-review-page-3

Its really hard to say that AMD need +40IPC. Even if they get +40IPC they need compilers... they need to do a lot of things. With that budget... if they can make it, then here I come ZEN.

https://www.youtube.com/watch?v=0mr9UiBCGQI
very nice example how well can actually FX do in some cases that are not FPU limited. Total war when i5 4460 deeps under 26 FPS, FX 8350 stays at 39-40 FPS.
Edited by Themisseble - 10/20/15 at 2:20pm
post #103 of 106
post #104 of 106
Quote:
Originally Posted by Themisseble View Post

https://www.youtube.com/watch?v=0mr9UiBCGQI
very nice example how well can actually FX do in some cases that are not FPU limited. Total war when i5 4460 deeps under 26 FPS, FX 8350 stays at 39-40 FPS.

Scientifically not possible. FX CPU's need to be roughly at ~5.7GHz to match a 3.3GHz Haswell.

Total war rome 2 doesnt seem to be taking advantage of more than 4 threads so its literally impossible FX CPU's can match Haswell;


Quote:
Originally Posted by Themisseble View Post

Not totally correct.
Ivy,Sandy and haswell = bigger
http://forums.anandtech.com/showthread.php?t=2294334

Ofc jaguar will be smaller, thats 2 ALU's/2x 128 bit FPU's against 4 ALU's and 3x 256 bit FPU's. That thing is empty as hell.
Quote:
Originally Posted by Themisseble View Post

FPU performance is very low on bulldozers... it is as fast as Jaguars FPU (clock per clock)

And integer performance. Haswell has twice as much ALU's per thread, should be theoretically a 100% increase in IPC.
post #105 of 106
Quote:
Originally Posted by Faithh View Post

Total war rome 2 doesnt seem to be taking advantage of more than 4 threads so its literally impossible FX CPU's can match Haswell

At least the minimums go up! Might be losing some fps on the high end and gaining on the low end?
Cute PC
(15 items)
 
  
CPUMotherboardGraphicsRAM
4930k@4200 Sabertooth x79 R9 290 Tri-X@950/1250 4x4GB@2133CL9 
Hard DriveCoolingOSMonitor
Crucial BX100 Mugen 4 Win7 Benq xl2411z 
MonitorKeyboardPowerCase
NEC EA231WMi QPad-MK50 (reds) Seasonic S12G 750 Define R4  
MouseMouse PadAudio
Deathadder 3.5G BE Razer Goliathus Speed Edition Large Onboard 
  hide details  
Reply
Cute PC
(15 items)
 
  
CPUMotherboardGraphicsRAM
4930k@4200 Sabertooth x79 R9 290 Tri-X@950/1250 4x4GB@2133CL9 
Hard DriveCoolingOSMonitor
Crucial BX100 Mugen 4 Win7 Benq xl2411z 
MonitorKeyboardPowerCase
NEC EA231WMi QPad-MK50 (reds) Seasonic S12G 750 Define R4  
MouseMouse PadAudio
Deathadder 3.5G BE Razer Goliathus Speed Edition Large Onboard 
  hide details  
Reply
post #106 of 106
Quote:
Originally Posted by Themisseble View Post

Not totally correct.
Ivy,Sandy and haswell = bigger
http://forums.anandtech.com/showthread.php?t=2294334

I know that 2MB of L cache should take around 10mm^2 on 28nm. So 3.1mm^2 for Jaguar + Lcache ~ 5.6mm^2

And now Cinebech does show you IPC (compilers are not the best for AMD).
IPC = FPU + INTEGER perf of CPU. (yes, I know many other things... but let it make simple)

FPU performance is very low on bulldozers... it is as fast as Jaguars FPU (clock per clock)

If AMD has 86% slower FPU...
- steamroller has around 86% slower FPU.

BUT OCED kaveri exceeds performance of ivy bridge in some benchmarks.
http://blackholetec.com/main/article/amd-a10-7850k-kaveri-review-page-3

Its really hard to say that AMD need +40IPC. Even if they get +40IPC they need compilers... they need to do a lot of things. With that budget... if they can make it, then here I come ZEN.

https://www.youtube.com/watch?v=0mr9UiBCGQI
very nice example how well can actually FX do in some cases that are not FPU limited. Total war when i5 4460 deeps under 26 FPS, FX 8350 stays at 39-40 FPS.

Like I said in the original post, Intel numbers were based on estimation.

My AMD numbers are perfectly accurate.

On AMD designs made on 28nm process, the 256x8K chunk of L2 cache requires ~11.1mm² - 13.75mm² depending on design.

The Anandtech link contains somewhat inaccurate information, since Kabini has never been manufactured on GlobalFoundries process.
16h Models 30-3Fh (Mullins) was the first Cat core ever manufactured at GlobalFoundries.

Ivy Bridge 3570K at 4.7GHz scores 7.62pts in Cinebench R11.5. Godavari (Steamroller) based 7870K at the same clocks scores 4.5pts. That´s 69.33% (IPC) advantage for Ivy Bridge. In Cinebench R15 the situation is even worse for AMD.

There is nothing wrong in using ICL, which is the case with Cinebenches. Both AMD and Intel CPUs get the same amount of optimization and instructions used. Any binaries which are fully optimized for specific Intel µArchs (by ICL) cannot even run on AMD CPUs. The binary won´t start unless CPUID returns "GenuineIntel" or the dispatcher is patched away manually.

You can replicate similar differences in other applications, which are both open source and compiled with a open source compiler. In X265 for example the differences are just as large or even bigger than in Cinebench R15, which is the saddest of the usual benchmarks for AMD.

AMD doesn´t need their own compiler, they just need to start making proper CPU designs.
Edited by The Stilt - 10/20/15 at 11:49pm
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [kitguru] AMD: We have taped out our first FinFET products