Overclock.net › Forums › Industry News › Hardware News › [OC3D] AMD's Zen will have a "greater than 40%" IPC improvement over Excavator, says Lisa Su
New Posts  All Forums:Forum Nav:

[OC3D] AMD's Zen will have a "greater than 40%" IPC improvement over Excavator, says Lisa Su - Page 53

post #521 of 841
Quote:
Originally Posted by Seraphic View Post

Is this the Flagship Zen FX CPU or the Flagship Zen Opteron?
It looks like HPC APU with 200-300W TDP, which is/was 2017 product.

post #522 of 841
Quote:
Originally Posted by 2010rig View Post

I guess they need to provide the updated hardware, to the 6 people who'll be buying these this month. biggrin.gif

Me being one of them. I'm waiting patiently for my ASUS 970 Pro Gaming / Aura and FX-8300 !!! tongue.gif
Little Star
(15 items)
 
  
CPUMotherboardGraphicsRAM
AMD FX-8300 970 PRO GAMING/AURA MSI RX 480 Gaming X 8G Kingston Hyper X Savage 2400 16G 
Hard DriveHard DriveCoolingOS
Crucial MX300 Samsung Spinpoint F3 HD502HJ Corsair H100i V2 Windows 10 Professional x64 
MonitorKeyboardPowerCase
LG 27MP68VQ Logitech G710+ Silverstone Strider ST85F 850W Fractal Design Define C Windowed 
MouseMouse PadAudio
Razer Mamba Tournament Edition Razer Goliathus Speed (Speed Edition) Kingston Hyper X Cloud 2 
  hide details  
Reply
Little Star
(15 items)
 
  
CPUMotherboardGraphicsRAM
AMD FX-8300 970 PRO GAMING/AURA MSI RX 480 Gaming X 8G Kingston Hyper X Savage 2400 16G 
Hard DriveHard DriveCoolingOS
Crucial MX300 Samsung Spinpoint F3 HD502HJ Corsair H100i V2 Windows 10 Professional x64 
MonitorKeyboardPowerCase
LG 27MP68VQ Logitech G710+ Silverstone Strider ST85F 850W Fractal Design Define C Windowed 
MouseMouse PadAudio
Razer Mamba Tournament Edition Razer Goliathus Speed (Speed Edition) Kingston Hyper X Cloud 2 
  hide details  
Reply
post #523 of 841
Quote:
Originally Posted by looncraz View Post

It is really hard to say what the impact of L3 removal would be without actual platform benchmarks, but usually it was only 3~5% on average for the construction cores - but there are specific cases (mostly in server/big data) where removing the L3 can cost 15% or more.

If that APU has HBM memory that is used to cache the same tasks, it may be a much lesser loss - or even a gain given the immense increase in capacity. The L3, in that case, may even act as a source of unwanted latency, which may be why removal would make more sense than inclusion.
See, I was asking because the HBM is likely going to be attached to the iGPU, which means any L2$ miss would result in a request going out through the CPU-GPU interlink, through the GPU memory controller, to the HBM, wait for the HBM to service the request (as it is a form of DRAM) and then all the way back. Theres the latency penalty from doing that, but then theres the bandwidth penalty as well, as the interlink between CPU and GPU is only rumored to hit about 100GB/s, which is in turn going to take a chunk out of it from the CPU-GPU traffic. Id rather be able to hit ~200GB/s and 10ns latency to the local L3 cache than <100GB/s and 20-30+ns latency to the HBM, even if the HBM has orders of magnitude more capacity.

Who knows though, other than the AMD engineers who built the damn thing? I could be entirely wrong (you right) and the HBM's relatively enormous capacity is enough to null out the latency and bandwidth hits, or render them inconsequential.
Leviathan
(17 items)
 
Charred
(10 items)
 
 
CPUMotherboardGraphicsGraphics
Xeon E5-2690 Biostar TPower X79 PNY GTX 660 2GB HP GT 440 OEM 
RAMHard DriveHard DriveHard Drive
Gskill Ripjaws 4x2GB 1600mhz Seagate Barracuda 500GB Seagate Barracuda 1.5TB Western Digital Caviar Blue 640GB 
Hard DriveCoolingOSMonitor
Patriot Pyro 60GB Xigmatek Gaia Windows 7 Ultimate Acer S230HL 
MonitorKeyboardPowerCase
Princeton 1280x1024 19" Logitech K120 Seasonic G550 Xclio Nighthawk 
Mouse
Logitech MX310 
  hide details  
Reply
Leviathan
(17 items)
 
Charred
(10 items)
 
 
CPUMotherboardGraphicsGraphics
Xeon E5-2690 Biostar TPower X79 PNY GTX 660 2GB HP GT 440 OEM 
RAMHard DriveHard DriveHard Drive
Gskill Ripjaws 4x2GB 1600mhz Seagate Barracuda 500GB Seagate Barracuda 1.5TB Western Digital Caviar Blue 640GB 
Hard DriveCoolingOSMonitor
Patriot Pyro 60GB Xigmatek Gaia Windows 7 Ultimate Acer S230HL 
MonitorKeyboardPowerCase
Princeton 1280x1024 19" Logitech K120 Seasonic G550 Xclio Nighthawk 
Mouse
Logitech MX310 
  hide details  
Reply
post #524 of 841
Quote:
Originally Posted by Scorpion87 View Post

Me being one of them. I'm waiting patiently for my ASUS 970 Pro Gaming / Aura and FX-8300 !!! tongue.gif
congratulations!! you're making history. biggrin.gif

Thanks to your contribution, AMD is staying afloat. thumbsupsmiley.png
2010rig
(14 items)
 
Galaxy S3
(8 items)
 
 
CPUMotherboardGraphicsRAM
X5660 @ 4.5  ASUS P6X58D-E 980TI? 12GB OCZ Platinum - 7-7-7-21 
Hard DriveCoolingOSMonitor
1 80GB SSD x25m - 3TB F3 + F4 NH-D14 Windows 7 Ultimate LG 47LH55 
KeyboardPowerCaseMouse
Natural Wireless Keyboard Corsair 750HX CM 690 II Advanced MX 518 
CPUGraphicsRAMHard Drive
Snapdragon S4 Dual core 1500mhz Adreno 225 Samsung 2GB 16GB Onboard Flash 
OSMonitorPowerCase
Android 4.4.2 - CM11 4.8" AMOLED 1280x720 2100 mAh battery Otterbox Defender 
  hide details  
Reply
2010rig
(14 items)
 
Galaxy S3
(8 items)
 
 
CPUMotherboardGraphicsRAM
X5660 @ 4.5  ASUS P6X58D-E 980TI? 12GB OCZ Platinum - 7-7-7-21 
Hard DriveCoolingOSMonitor
1 80GB SSD x25m - 3TB F3 + F4 NH-D14 Windows 7 Ultimate LG 47LH55 
KeyboardPowerCaseMouse
Natural Wireless Keyboard Corsair 750HX CM 690 II Advanced MX 518 
CPUGraphicsRAMHard Drive
Snapdragon S4 Dual core 1500mhz Adreno 225 Samsung 2GB 16GB Onboard Flash 
OSMonitorPowerCase
Android 4.4.2 - CM11 4.8" AMOLED 1280x720 2100 mAh battery Otterbox Defender 
  hide details  
Reply
post #525 of 841
Quote:
Originally Posted by Cyrious View Post

See, I was asking because the HBM is likely going to be attached to the iGPU, which means any L2$ miss would result in a request going out through the CPU-GPU interlink, through the GPU memory controller, to the HBM, wait for the HBM to service the request (as it is a form of DRAM) and then all the way back. Theres the latency penalty from doing that, but then theres the bandwidth penalty as well, as the interlink between CPU and GPU is only rumored to hit about 100GB/s, which is in turn going to take a chunk out of it from the CPU-GPU traffic. Id rather be able to hit ~200GB/s and 10ns latency to the local L3 cache than <100GB/s and 20-30+ns latency to the HBM, even if the HBM has orders of magnitude more capacity.

Who knows though, other than the AMD engineers who built the damn thing? I could be entirely wrong (you right) and the HBM's relatively enormous capacity is enough to null out the latency and bandwidth hits, or render them inconsequential.

AMD doesn't tend to build fast caches. Their L3 would most likely have lower latency but less bandwidth. The onboard HBM makes the most sense as L4 since it's like having a faster system memory array. Their L3 doesn't contribute a lot in current designs though. You could probably remove it and just use HBM as L3 without any real problem.
post #526 of 841
Will it offer an improvement over my 3770k? If so, I will be onboard. I liked AMD in the early 2000s then it lost it's footing. Normally I wouldn't think of buying anything but Intel, but then life happened. I am no longer single and have a child. My wife would probably divorce me if I spent 400 on a Skylake setup:)
post #527 of 841
I hope Zen does well, have had an itch for years to switch back to amd I think the last chip I used from them was a phenom II 720BE and I only bought it to see if I could unlock the fourth core.
I'm just dreaming of an all red team build again.
post #528 of 841
Quote:
Originally Posted by Cyrious View Post

See, I was asking because the HBM is likely going to be attached to the iGPU, which means any L2$ miss would result in a request going out through the CPU-GPU interlink, through the GPU memory controller, to the HBM, wait for the HBM to service the request (as it is a form of DRAM) and then all the way back. Theres the latency penalty from doing that, but then theres the bandwidth penalty as well, as the interlink between CPU and GPU is only rumored to hit about 100GB/s, which is in turn going to take a chunk out of it from the CPU-GPU traffic. Id rather be able to hit ~200GB/s and 10ns latency to the local L3 cache than <100GB/s and 20-30+ns latency to the HBM, even if the HBM has orders of magnitude more capacity.

Who knows though, other than the AMD engineers who built the damn thing? I could be entirely wrong (you right) and the HBM's relatively enormous capacity is enough to null out the latency and bandwidth hits, or render them inconsequential.

The latency and bandwidth penalties could be made up for with capacity and streaming techniques for the target applications. It's one thing to be able to fit an 8MB dataset into a 100GB/s 28cycle latency buffer and quite another to be able to fit a 1GB dataset with 100GB/s 48ns buffer (plus some interface overhead - probably ~15 cycles total, including memory controller commands).

BTW, I used Intel's superior cache numbers from Sandy Bridge (96GB/s, 28-cycle latency). AMD has never proven to be capable of that level of performance with their L3 caches, but we'll just assume they managed it for Zen.

Of course, we're only talking about large dataset workloads... which is exactly where AMD would likely target such beast of an APU... and the only people willing to spend big money on something like this.

...

If we pretend the HBM is used as an L4, though, that means we will have a 28cycle added penalty before being able to hit up the HBM. This would be worthwhile only some of the time - in certain latency-sensitive cases... however, those very same cases will also be hurt every time there is a miss. This would mean that the APU would need to search main memory and the HBM concurrently.

The next option is to allow full software control and allow systems to map part of the HBM as system memory and merely copy speed-sensitive data into that higher performance memory (likely 512GB/s). An operating system's file system cache would be a fantastic use for this - and would speed up every program that uses the file system. AMD's HSA could be a real boon for them... and this could be the first serious use of it we will see.
post #529 of 841
Quote:
Originally Posted by mr. biggums View Post

I hope Zen does well, have had an itch for years to switch back to amd I think the last chip I used from them was a phenom II 720BE and I only bought it to see if I could unlock the fourth core.
I'm just dreaming of an all red team build again.

Loved my little 720BE! Unlocked that fourth core, and hit 3.4GHz on a crappy motherboard with a stock cooler. That was when I was actually broke rather than just thinking I am be because I'm unwilling to accept what my bank account says biggrin.gif
post #530 of 841
Quote:
Originally Posted by looncraz View Post

The latency and bandwidth penalties could be made up for with capacity and streaming techniques for the target applications. It's one thing to be able to fit an 8MB dataset into a 100GB/s 28cycle latency buffer and quite another to be able to fit a 1GB dataset with 100GB/s 48ns buffer (plus some interface overhead - probably ~15 cycles total, including memory controller commands).

BTW, I used Intel's superior cache numbers from Sandy Bridge (96GB/s, 28-cycle latency). AMD has never proven to be capable of that level of performance with their L3 caches, but we'll just assume they managed it for Zen.

Of course, we're only talking about large dataset workloads... which is exactly where AMD would likely target such beast of an APU... and the only people willing to spend big money on something like this

If we pretend the HBM is used as an L4, though, that means we will have a 28cycle added penalty before being able to hit up the HBM. This would be worthwhile only some of the time - in certain latency-sensitive cases... however, those very same cases will also be hurt every time there is a miss. This would mean that the APU would need to search main memory and the HBM concurrently.
Alright, point. The sheer size of the HBM memory would effectively negate the performance hit except for in areas where the code is time sensitive.
Quote:
The next option is to allow full software control and allow systems to map part of the HBM as system memory and merely copy speed-sensitive data into that higher performance memory (likely 512GB/s). An operating system's file system cache would be a fantastic use for this - and would speed up every program that uses the file system. AMD's HSA could be a real boon for them... and this could be the first serious use of it we will see.
That could be very nice.
Leviathan
(17 items)
 
Charred
(10 items)
 
 
CPUMotherboardGraphicsGraphics
Xeon E5-2690 Biostar TPower X79 PNY GTX 660 2GB HP GT 440 OEM 
RAMHard DriveHard DriveHard Drive
Gskill Ripjaws 4x2GB 1600mhz Seagate Barracuda 500GB Seagate Barracuda 1.5TB Western Digital Caviar Blue 640GB 
Hard DriveCoolingOSMonitor
Patriot Pyro 60GB Xigmatek Gaia Windows 7 Ultimate Acer S230HL 
MonitorKeyboardPowerCase
Princeton 1280x1024 19" Logitech K120 Seasonic G550 Xclio Nighthawk 
Mouse
Logitech MX310 
  hide details  
Reply
Leviathan
(17 items)
 
Charred
(10 items)
 
 
CPUMotherboardGraphicsGraphics
Xeon E5-2690 Biostar TPower X79 PNY GTX 660 2GB HP GT 440 OEM 
RAMHard DriveHard DriveHard Drive
Gskill Ripjaws 4x2GB 1600mhz Seagate Barracuda 500GB Seagate Barracuda 1.5TB Western Digital Caviar Blue 640GB 
Hard DriveCoolingOSMonitor
Patriot Pyro 60GB Xigmatek Gaia Windows 7 Ultimate Acer S230HL 
MonitorKeyboardPowerCase
Princeton 1280x1024 19" Logitech K120 Seasonic G550 Xclio Nighthawk 
Mouse
Logitech MX310 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [OC3D] AMD's Zen will have a "greater than 40%" IPC improvement over Excavator, says Lisa Su