Overclock.net › Forums › Industry News › Hardware News › [OC3D] AMD's Zen will have a "greater than 40%" IPC improvement over Excavator, says Lisa Su
New Posts  All Forums:Forum Nav:

[OC3D] AMD's Zen will have a "greater than 40%" IPC improvement over Excavator, says Lisa Su - Page 72

post #711 of 841
Quote:
Originally Posted by Robenger View Post

Where are your beautiful graphs you promised?

Only said that I should make them biggrin.gif

I would like to graph numerous benchmarks independently and merge my intel and AMD data with proper clock speed references... but, as you might imagine, that's quite time consuming.
post #712 of 841
Quote:
Originally Posted by looncraz View Post

Only said that I should make them biggrin.gif

I would like to graph numerous benchmarks independently and merge my intel and AMD data with proper clock speed references... but, as you might imagine, that's quite time consuming.

We're tired of all the unfilled promises. Worst Valentines Day present ever.
Big Timmah
(13 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 5 1600 Asrock x370 Killer SLI/AC Sapphire Radeon Nitro Fury CORSAIR Vengeance LPX 16GB 3200mhz 
Hard DriveCoolingOSMonitor
PNY 480GB SSD PH-TC12DX Black Windows 10 Pro LG 29inch Ultrawide 
KeyboardPowerCaseMouse
Corsair K70 Thermaltake SMART M Series 850W NZXT S340 White Steel ATX Mid Tower Case Wireless Logitech thing 
Mouse Pad
With a supple pad  
  hide details  
Reply
Big Timmah
(13 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 5 1600 Asrock x370 Killer SLI/AC Sapphire Radeon Nitro Fury CORSAIR Vengeance LPX 16GB 3200mhz 
Hard DriveCoolingOSMonitor
PNY 480GB SSD PH-TC12DX Black Windows 10 Pro LG 29inch Ultrawide 
KeyboardPowerCaseMouse
Corsair K70 Thermaltake SMART M Series 850W NZXT S340 White Steel ATX Mid Tower Case Wireless Logitech thing 
Mouse Pad
With a supple pad  
  hide details  
Reply
post #713 of 841
Quote:
Originally Posted by The Stilt View Post

- There are currently no Excavator based designs available which support DDR-2400
- AVX2 is completely useless on Excavator. It is slower than SSE2, SSE3, SSE4, XOP or AVX.

I think it is pretty safe to say that Excavator kills Kaveri & Piledriver in AVX2 since neither of those support AVX2 in the first place...

Bristol Ridge is to bring DDR4-2400 support.

Quote:
Originally Posted by looncraz View Post

With Lisa Su's background, and Jim Keller statements, it seems that the 40% is genuine single-threaded improvement. I do have a great deal of concern that SMT is part of that 40%, though. But that, then, would be called instruction throughput - the term AMD liked to throw around (aside from JF-AMD) during the lead-up to Bulldozer.

Either way, worst case is still Sandy Bridge IPC, which is plenty to become relevant again.

Sandy Bridge performance is still more than enough for most people, and if they're getting this out of a mainstream Zen APU, then even better. Those who seek better performance have the octa-core option.
Polaris Ib
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i5-4670K ASRock Z97 Extreme3 Gigabyte Radeon R9 380X G1 Gaming — 4 GB G.SKILL Ares DDR3-1866 — 2 × 8 GB 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 850 EVO — 250 GB [OS/Programs] Western Digital Black — 1 TB [Media] Seagate Barracuda — 1 TB [Web Development] LG GH24NSC0 CD/DVD 
CoolingCoolingOSMonitor
Enermax ETS-T40-TB [Processor Cooler] 2 × Fractal Design Dynamic GP14 [Case Fans] Microsoft Windows 10 Pro 3 × ASUS VC239H — 23.6" 1080p IPS 
KeyboardPowerCaseMouse
Gigabyte GK-K7100 EVGA SuperNOVA G2 — 550 W Fractal Design Define R5 — Black without Window Logitech M185 
  hide details  
Reply
Polaris Ib
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i5-4670K ASRock Z97 Extreme3 Gigabyte Radeon R9 380X G1 Gaming — 4 GB G.SKILL Ares DDR3-1866 — 2 × 8 GB 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 850 EVO — 250 GB [OS/Programs] Western Digital Black — 1 TB [Media] Seagate Barracuda — 1 TB [Web Development] LG GH24NSC0 CD/DVD 
CoolingCoolingOSMonitor
Enermax ETS-T40-TB [Processor Cooler] 2 × Fractal Design Dynamic GP14 [Case Fans] Microsoft Windows 10 Pro 3 × ASUS VC239H — 23.6" 1080p IPS 
KeyboardPowerCaseMouse
Gigabyte GK-K7100 EVGA SuperNOVA G2 — 550 W Fractal Design Define R5 — Black without Window Logitech M185 
  hide details  
Reply
post #714 of 841
post #715 of 841

@looncraz Translation please!
Big Timmah
(13 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 5 1600 Asrock x370 Killer SLI/AC Sapphire Radeon Nitro Fury CORSAIR Vengeance LPX 16GB 3200mhz 
Hard DriveCoolingOSMonitor
PNY 480GB SSD PH-TC12DX Black Windows 10 Pro LG 29inch Ultrawide 
KeyboardPowerCaseMouse
Corsair K70 Thermaltake SMART M Series 850W NZXT S340 White Steel ATX Mid Tower Case Wireless Logitech thing 
Mouse Pad
With a supple pad  
  hide details  
Reply
Big Timmah
(13 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 5 1600 Asrock x370 Killer SLI/AC Sapphire Radeon Nitro Fury CORSAIR Vengeance LPX 16GB 3200mhz 
Hard DriveCoolingOSMonitor
PNY 480GB SSD PH-TC12DX Black Windows 10 Pro LG 29inch Ultrawide 
KeyboardPowerCaseMouse
Corsair K70 Thermaltake SMART M Series 850W NZXT S340 White Steel ATX Mid Tower Case Wireless Logitech thing 
Mouse Pad
With a supple pad  
  hide details  
Reply
post #716 of 841
Quote:
Originally Posted by Robenger View Post

@looncraz Translation please!

LOL! I'll do my best biggrin.gif

First up, the patent represents a simple, but probably extremely effective, means to reduce power draw - and eliminate double decoding of instructions - in loops small enough to fit in the instruction cache. When a loop is detected (which is rather a simple affair), the next thing to do is to make sure you have all of the instructions you need to execute that loop in the instruction cache. If you do, you can turn off the decoders and run the loop [almost] entirely just in the execution units.

The awesome thing with that is that you have more power available for use during execution (probably *not* more than a 10 watts at full clocks). In addition, the cache means you aren't stalling for instructions in loops whereas the construction cores (and even Intel cores) may stall, leaving the execution units idle (hurting performance). It's a really awesome, but simple, idea.

The L0 ITLB is probably related to keeping memory address translations for the above loops, which decreases the load on the AGUs.

Checkpoint queue parity I think was described quite well - it's a 100% performance feature. Intel actually does something similar for bypassing pipeline stages. The best part is that this is the first hint we have that AMD may have kept the longer pipelined design so that they can keep high clock-speeds with high IPC (like Intel managed).

What's most important about this patch, though, is it gives us a good idea as to which 'leaked' information or slides are accurate. It also tells us that AMD is aiming very high, and that 40% may be more of a safer claim than it seemed.

For me, I think the most interesting aspect of all of this is just how much effort AMD has put into minimizing memory accesses. Everywhere we look AMD has done something. From multiple concurrent page walks (looking up an address in the operating system's page table) to no fewer than four levels of caches, dedicated path to the AGUs, to multiple translation look-aside buffers (TLB). There's a lot in there that is especially good for SMT. My 15% scaling estimate may have just been shattered, but there are many inhibitors for SMT performance. If AMD gets close to Intel's Hyper-Threading (30~40%), then we have a more serious market shakeup coming.
Edited by looncraz - 2/29/16 at 2:25pm
post #717 of 841
Quote:
Originally Posted by looncraz View Post

LOL! I'll do my best biggrin.gif

First up, the patent represents a simple, but probably extremely effective, means to reduce power draw - and eliminate double decoding of instructions - in loops small enough to fit in the instruction cache. When a loop is detected (which is rather a simple affair), the next thing to do is to make sure you have all of the instructions you need to execute that loop in the instruction cache. If you do, you can turn off the decoders and run the loop [almost] entirely just in the execution units.

The awesome thing with that is that you have more power available for use during execution (probably *not* more than a 10 watts at full clocks). In addition, the cache means you aren't stalling for instructions in loops whereas the construction cores (and even Intel cores) may stall, leaving the execution units idle (hurting performance). It's a really awesome, but simple, idea.

The L0 ITLB is probably related to keeping memory address translations for the above loops, which decreases the load on the AGUs.

Checkpoint queue parity I think was described quite well - it's a 100% performance feature. Intel actually does something similar for bypassing pipeline stages. The best part is that this is the first hint we have that AMD may have kept the longer pipelined design so that they can keep high clock-speeds with high IPC (like Intel managed).

What's most important about this patch, though, is it gives us a good idea as to which 'leaked' information or slides are accurate. It also tells us that AMD is aiming very high, and that 40% may be more of a safer claim than it seemed.

For me, I think the most interesting aspect of all of this is just how much effort AMD has put into minimizing memory accesses. Everywhere we look AMD has done something. From multiple concurrent page walks (looking up an address in the operating system's page table) to no fewer than four levels of caches, dedicated path to the AGUs, to multiple translation look-aside buffers (TLB). There's a lot in there that is especially good for SMT. My 15% scaling estimate may have just been shattered, but there are many inhibitors for SMT performance. If AMD gets close to Intel's Hyper-Threading (30~40%), then we have a more serious market shakeup coming.

So what you're saying is that it's going to be moar fasterer?
Big Timmah
(13 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 5 1600 Asrock x370 Killer SLI/AC Sapphire Radeon Nitro Fury CORSAIR Vengeance LPX 16GB 3200mhz 
Hard DriveCoolingOSMonitor
PNY 480GB SSD PH-TC12DX Black Windows 10 Pro LG 29inch Ultrawide 
KeyboardPowerCaseMouse
Corsair K70 Thermaltake SMART M Series 850W NZXT S340 White Steel ATX Mid Tower Case Wireless Logitech thing 
Mouse Pad
With a supple pad  
  hide details  
Reply
Big Timmah
(13 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 5 1600 Asrock x370 Killer SLI/AC Sapphire Radeon Nitro Fury CORSAIR Vengeance LPX 16GB 3200mhz 
Hard DriveCoolingOSMonitor
PNY 480GB SSD PH-TC12DX Black Windows 10 Pro LG 29inch Ultrawide 
KeyboardPowerCaseMouse
Corsair K70 Thermaltake SMART M Series 850W NZXT S340 White Steel ATX Mid Tower Case Wireless Logitech thing 
Mouse Pad
With a supple pad  
  hide details  
Reply
post #718 of 841
Quote:
Originally Posted by Robenger View Post

So what you're saying is that it's going to be moar fasterer?

Yup, but seemingly targeting the very areas where Intel is strongest. It will be interesting to see how much of Intel's lead they can erase.
post #719 of 841
Okay, now I'm starting to get excited. I was going to buy Zen anyway, but this is just icing on that cake. I really, really want to know what they're going to do in VR and, especially, Dx12. Has anybody noticed how well 8320s are doing in the Steam VR benchmark?
post #720 of 841
Quote:
Originally Posted by Fyrwulf View Post

Okay, now I'm starting to get excited. I was going to buy Zen anyway, but this is just icing on that cake. I really, really want to know what they're going to do in VR and, especially, Dx12. Has anybody noticed how well 8320s are doing in the Steam VR benchmark?

It's important to recognize that all of this is just part of the same 40% AMD estimate, just supporting data as to how they accomplished it.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [OC3D] AMD's Zen will have a "greater than 40%" IPC improvement over Excavator, says Lisa Su