Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [WCCF] AMD contracts TSMC to produce Zen @ 16nm instead of 14nm @ Globalfoundries (Updated)
New Posts  All Forums:Forum Nav:

[WCCF] AMD contracts TSMC to produce Zen @ 16nm instead of 14nm @ Globalfoundries (Updated) - Page 17

post #161 of 220
Quote:
Originally Posted by The Stilt View Post

There is no CMT (or any other penalty) present in Carrizo unless the two cores within a compute unit are both operating simultaneously.

That's not entirely true, at least from my understanding. There are multiple stages in the pipeline specifically in place to handle core assignment that would otherwise not be present in a standard core. This effects the front-end and the back-end. The dispatch controller has an extra stage or two, for example, then the wcc takes a few cycles. IIRC, there is about a five cycle total penalty that is constantly present, and that can jump dramatically in contention. This is the greatest reason why Bulldozer design has low single threaded IPC, IMHO, followed very closely by the cache system (WCC is a great optimization, hiding retirement contention quite effectively).
post #162 of 220
Quote:
Originally Posted by looncraz View Post

That's not entirely true, at least from my understanding. There are multiple stages in the pipeline specifically in place to handle core assignment that would otherwise not be present in a standard core. This effects the front-end and the back-end. The dispatch controller has an extra stage or two, for example, then the wcc takes a few cycles. IIRC, there is about a five cycle total penalty that is constantly present, and that can jump dramatically in contention. This is the greatest reason why Bulldozer design has low single threaded IPC, IMHO, followed very closely by the cache system (WCC is a great optimization, hiding retirement contention quite effectively).

Point being, the design as it is cannot go any faster no matter what you do smile.gif
post #163 of 220
Quote:
Originally Posted by KyadCK View Post

In regards to the CMT penalty though, it was -20% on both cores when both cores in a Piledriver module are used at the same time.

When used near max capacity that is.
Cute PC
(15 items)
 
  
CPUMotherboardGraphicsRAM
4930k@4200 Sabertooth x79 R9 290 Tri-X@950/1250 4x4GB@2133CL9 
Hard DriveCoolingOSMonitor
Crucial BX100 Mugen 4 Win7 Benq xl2411z 
MonitorKeyboardPowerCase
NEC EA231WMi QPad-MK50 (reds) Seasonic S12G 750 Define R4  
MouseMouse PadAudio
Deathadder 3.5G BE Razer Goliathus Speed Edition Large Onboard 
  hide details  
Reply
Cute PC
(15 items)
 
  
CPUMotherboardGraphicsRAM
4930k@4200 Sabertooth x79 R9 290 Tri-X@950/1250 4x4GB@2133CL9 
Hard DriveCoolingOSMonitor
Crucial BX100 Mugen 4 Win7 Benq xl2411z 
MonitorKeyboardPowerCase
NEC EA231WMi QPad-MK50 (reds) Seasonic S12G 750 Define R4  
MouseMouse PadAudio
Deathadder 3.5G BE Razer Goliathus Speed Edition Large Onboard 
  hide details  
Reply
post #164 of 220
Quote:
Originally Posted by Tivan View Post

Quote:
Originally Posted by KyadCK View Post

In regards to the CMT penalty though, it was -20% on both cores when both cores in a Piledriver module are used at the same time.

When used near max capacity that is.

Well yes, in a maxed scenario. I thought that's what we were talking about.

EDIT: Why does OCN remember the weirdest things in quotes even after you delete it from them?
Forge
(17 items)
 
Forge-LT
(7 items)
 
 
CPUMotherboardGraphicsGraphics
Intel i7-5960X (4.625Ghz) ASUS X99-DELUXE/U3.1 EVGA 1080ti SC2 Hybrid EVGA 1080ti SC2 Hybrid 
RAMHard DriveCoolingOS
64GB Corsair Dominator Platinum (3000Mhz 8x8GB) Samsung 950 Pro NVMe 512GB EK Predator 240 Windows 10 Enterprise x64 
MonitorKeyboardPowerCase
2x Acer XR341CK Corsair Vengeance K70 RGB Corsair AX1200 Corsair Graphite 780T 
MouseAudioAudioAudio
Corsair Vengeance M65 RGB Sennheiser HD700 Sound Blaster AE-5 Audio Technica AT4040 
Audio
30ART Mic Tube Amp 
CPUMotherboardGraphicsRAM
i7-4720HQ UX501JW-UB71T GTX 960m 16GB 1600 9-9-9-27 
Hard DriveOSMonitor
512GB PCI-e SSD Windows 10 Pro 4k IPS 
  hide details  
Reply
Forge
(17 items)
 
Forge-LT
(7 items)
 
 
CPUMotherboardGraphicsGraphics
Intel i7-5960X (4.625Ghz) ASUS X99-DELUXE/U3.1 EVGA 1080ti SC2 Hybrid EVGA 1080ti SC2 Hybrid 
RAMHard DriveCoolingOS
64GB Corsair Dominator Platinum (3000Mhz 8x8GB) Samsung 950 Pro NVMe 512GB EK Predator 240 Windows 10 Enterprise x64 
MonitorKeyboardPowerCase
2x Acer XR341CK Corsair Vengeance K70 RGB Corsair AX1200 Corsair Graphite 780T 
MouseAudioAudioAudio
Corsair Vengeance M65 RGB Sennheiser HD700 Sound Blaster AE-5 Audio Technica AT4040 
Audio
30ART Mic Tube Amp 
CPUMotherboardGraphicsRAM
i7-4720HQ UX501JW-UB71T GTX 960m 16GB 1600 9-9-9-27 
Hard DriveOSMonitor
512GB PCI-e SSD Windows 10 Pro 4k IPS 
  hide details  
Reply
post #165 of 220
Quote:
Originally Posted by The Stilt View Post

Point being, the design as it is cannot go any faster no matter what you do smile.gif

There's not much room for improvement without doubling-down on the design and so many internal components that you end up with two cores sitting side by side, with two independent execution pathways.

That said, if the execution cores were all jumbled into one pile (4x ALU 4x AGU, and the full FPU), and any thread can use any execution unit at any time and the only places you care about which thread owns which operation is in atomic operations, the branch predictor (which is not in-lined), and barely even a retirement ordering unit, then you have a situation where all execution units are available to every thread, and you don't have anywhere near as many penalties.

Of course, we call that SMT thumb.gif

In theory, this can be done with only one extra stage in the entire pipeline - and that would be in the reordering retirement unit (yes, I like to use lots of different names for the same thing, so sue me tongue.gif).

Zen will undoubtedly be going this way, and AMD has a lot of IP that could give them this superior arrangement with just a couple of years of design work (mostly on the logic front). Making the core wider will help with single threads and SMT scaling, provided AMD does some serious work on the FastPath code. They will need the ability to assign instructions to groups of units in a manner consistent with software priorities (which instruction types can be executed at once, vs which ones can wait for the others more often than not). This is exactly what Intel does with their architecture.

Indeed, AMD got a HUGE freebie from Intel on this: which instructions take precedence, and which can wait. That's because Intel published their findings from billions of dollars worth of research and even published a detailed list of what instructions go to what unified reservation station port.



AMD undoubtedly used a very similar arrangement.

EDIT:

The graphic doesn't show ports 2,3, &4, because they are dedicated to Load, Store Addr, and Store Data, respectively.

If AMD just copies the design straight-up, I'd be surprised, Id expect them to try to differentiate some, maybe with more ports to help emphasize floating point performance, or to help mitigate a weak point in their design.
Edited by looncraz - 9/27/15 at 11:31pm
post #166 of 220
Quote:
Originally Posted by looncraz View Post

AMD got a HUGE freebie from Intel on this: which instructions take precedence, and which can wait. That's because Intel published their findings from billions of dollars worth of research and even published a detailed list of what instructions go to what unified reservation station port.

tongue.gif Do you really believe it's freebie ?
Is it not possible that that published data by Intel adjusts with the x86 licensing terms between two companies, letting Intel know users what is their CPU made of ? Of course nobody else other than AMD is allowed to use that info.
Haswell i3
(18 items)
 
  
CPUMotherboardGraphicsRAM
Core i3-4150 @ 3.5 GHz Asus B85M-G Rev 1.01, Bios: 2501 Integrated Intel HD 4400 2x 4GB DDR3 1600 MHz CL9 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 750 EVO 250GB Seagate Barracuda 1TB 7200.14 Seagate 500 GB 2.5" Samsung DVD/RW 
CoolingOSMonitorKeyboard
Corsair H70 Windows 10 64 bit Samsung A300N 20" 1600 x 900 60Hz 5ms 19Watt PS/2 Microsoft Wired Keyboard 500 
PowerCaseMouse
Corsair TX850 V2 CoolerMaster Elite 430 Black Logitech M170 
  hide details  
Reply
Haswell i3
(18 items)
 
  
CPUMotherboardGraphicsRAM
Core i3-4150 @ 3.5 GHz Asus B85M-G Rev 1.01, Bios: 2501 Integrated Intel HD 4400 2x 4GB DDR3 1600 MHz CL9 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 750 EVO 250GB Seagate Barracuda 1TB 7200.14 Seagate 500 GB 2.5" Samsung DVD/RW 
CoolingOSMonitorKeyboard
Corsair H70 Windows 10 64 bit Samsung A300N 20" 1600 x 900 60Hz 5ms 19Watt PS/2 Microsoft Wired Keyboard 500 
PowerCaseMouse
Corsair TX850 V2 CoolerMaster Elite 430 Black Logitech M170 
  hide details  
Reply
post #167 of 220
Quote:
Originally Posted by The Stilt View Post

We are only interested in Excavator´s maximum throughput (ie. CMT penalty excluded), since I cannot believe AMD used any other figure than that to base their "40% IPC improvement over Excavator" statement on. The CMT penalty obviously isn´t going to be carried to Zen and therefore it must be excluded from the equation.

If AMD has in fact based their "40% IPC improvement over Excavator" statement on Excavator multithreaded performance (i.e. CMT penalized performance) then Zen isn´t even worth of discussing.

This is exactly what ileakstuff is insinuating which led to my first post.
Mastodon Ryzen
(12 items)
 
HP Z220
(8 items)
 
 
CPUMotherboardGraphicsRAM
R7 1800X Asus Crosshair VI Hero Sapphire RX Vega 64 reference Gskill TridentZ 
Hard DriveOptical DriveCoolingOS
Pny SSD 240GB Crucial MX100 CM Nepton 280L Win 10 
MonitorPowerCaseMouse
Acer Predator XG270HU Freesync XFX 750W Pro HAF XM Logitech G502 
CPUMotherboardGraphicsCooling
i7 3770 HP Quadro K2000 HP 
OSPowerCaseMouse
Win 7  HP 400W HP CMT RAT 7 
  hide details  
Reply
Mastodon Ryzen
(12 items)
 
HP Z220
(8 items)
 
 
CPUMotherboardGraphicsRAM
R7 1800X Asus Crosshair VI Hero Sapphire RX Vega 64 reference Gskill TridentZ 
Hard DriveOptical DriveCoolingOS
Pny SSD 240GB Crucial MX100 CM Nepton 280L Win 10 
MonitorPowerCaseMouse
Acer Predator XG270HU Freesync XFX 750W Pro HAF XM Logitech G502 
CPUMotherboardGraphicsCooling
i7 3770 HP Quadro K2000 HP 
OSPowerCaseMouse
Win 7  HP 400W HP CMT RAT 7 
  hide details  
Reply
post #168 of 220
Quote:
Originally Posted by sumitlian View Post

tongue.gif Do you really believe it's freebie ?
Is it not possible that that published data by Intel adjusts with the x86 licensing terms between two companies, letting Intel know users what is their CPU made of ? Of course nobody else other than AMD is allowed to use that info.

Yup, a complete freebie. Intel published the information freely, without patenting the arrangement (that I've found). That means AMD can use that information freely when prioritizing instructions. Of course, they also have their own data profiles, but being able to look at Intel's methods may allow them to focus on optimizing more where where Intel is weak.

In the server world, for example, it is not unusual to pick a CPU that is slower at 90% of things, but that 10% area is just what you need, so you go with that option.

Not too unlike a gaming computer that will mostly run one game. You buy the video card that runs that one game best, even if that video card is slower in games you don't play... because you don't care about those.

It's valuable information Intel gave out freely, which is why they said they will no longer be doing that rolleyes.gif
post #169 of 220
Quote:
Originally Posted by looncraz View Post

Yup, a complete freebie. Intel published the information freely, without patenting the arrangement (that I've found). That means AMD can use that information freely when prioritizing instructions. Of course, they also have their own data profiles, but being able to look at Intel's methods may allow them to focus on optimizing more where where Intel is weak.

In the server world, for example, it is not unusual to pick a CPU that is slower at 90% of things, but that 10% area is just what you need, so you go with that option.

Not too unlike a gaming computer that will mostly run one game. You buy the video card that runs that one game best, even if that video card is slower in games you don't play... because you don't care about those.

It's valuable information Intel gave out freely, which is why they said they will no longer be doing that rolleyes.gif

I hope it benefits AMD, as you are implying. To some degree I believe on that too.
But if you look back in past that Intel's info always deceived AMD and it is still going on, remember the FMA4 damage.

May be you've already read this, from Agner Fog's site, I'm quoting this anyway.
"In August 2007, AMD announced a future instruction set called SSE5 with a new coding scheme. The early disclosure of AMD's intentions was a break with the previous policy where both companies had kept their intentions secret as long as possible. Intel's reply came in April 2008 with an early (probably premature) disclosure of their planned AVX instruction set. Intel's AVX coding scheme was much more flexible and future-oriented than AMD's SSE5 scheme, as I argued in a public discussion forum. Most importantly, the AVX scheme has room for future extensions of the size of the SIMD vector registers, while the SSE5 scheme has little room for any future extensions. It was pretty obvious that Intel had won this time, and thanks to the early disclosure of Intel's AVX instructions, it was not too late for AMD to change their plans. In May 2009, AMD published a revision of their plans where they modified the coding scheme for better compatibility with AVX. In addition to a full support of AVX, the revised AMD plan contains most of the original SSE5 instructions under the new name XOP and with the new coding scheme. Unfortunately, Intel had changed their plans in the meantime! In December 2008, Intel published a revision of their plans which involved a change of the coding of the fused multiply-and-add (FMA) instructions. Now it was too late for AMD to change their design once more, so the first AMD processors with FMA will follow the premature Intel specification rather than Intel's later revision. It is difficult to obtain compatibility when you are following a moving target.

"It is difficult to obtain compatibility when you are following a moving target"
^ this is exactly what I believe too.

From wiki
"Commentators have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in its development pipeline for something else. The XOP coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with future Intel codes. A similar compatibility issue is the difference between the FMA3 and FMA4 instruction sets. Intel initially proposed FMA4 in AVX/FMA specification version 3 to supersede the 3-operand FMA proposed by AMD in SSE5. After AMD adopted FMA4, Intel canceled FMA4 support and reverted to FMA3 in the AVX/FMA specification version 5"

AMD still seems to do the same (The following). Even if Zen fixes most bottlenecks what Bulldozer has had, it will not be that easy to reach to Intel's continuous update on their powerful FPU design and proper compiler support. AMD either got to invent something better than the whole AVX thing (very unlikely in present time as they will have to find inefficiencies to further improve it and they don't have any time to do this) or get beaten everytime.

Well I really hope Zen really comes out with the ISA at that time that have same VEX coding scheme what Intel uses at that time. (I mean The perfect following).
Edited by sumitlian - 9/28/15 at 4:17am
Haswell i3
(18 items)
 
  
CPUMotherboardGraphicsRAM
Core i3-4150 @ 3.5 GHz Asus B85M-G Rev 1.01, Bios: 2501 Integrated Intel HD 4400 2x 4GB DDR3 1600 MHz CL9 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 750 EVO 250GB Seagate Barracuda 1TB 7200.14 Seagate 500 GB 2.5" Samsung DVD/RW 
CoolingOSMonitorKeyboard
Corsair H70 Windows 10 64 bit Samsung A300N 20" 1600 x 900 60Hz 5ms 19Watt PS/2 Microsoft Wired Keyboard 500 
PowerCaseMouse
Corsair TX850 V2 CoolerMaster Elite 430 Black Logitech M170 
  hide details  
Reply
Haswell i3
(18 items)
 
  
CPUMotherboardGraphicsRAM
Core i3-4150 @ 3.5 GHz Asus B85M-G Rev 1.01, Bios: 2501 Integrated Intel HD 4400 2x 4GB DDR3 1600 MHz CL9 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 750 EVO 250GB Seagate Barracuda 1TB 7200.14 Seagate 500 GB 2.5" Samsung DVD/RW 
CoolingOSMonitorKeyboard
Corsair H70 Windows 10 64 bit Samsung A300N 20" 1600 x 900 60Hz 5ms 19Watt PS/2 Microsoft Wired Keyboard 500 
PowerCaseMouse
Corsair TX850 V2 CoolerMaster Elite 430 Black Logitech M170 
  hide details  
Reply
post #170 of 220
Quote:
Originally Posted by KyadCK View Post



EDIT: Why does OCN remember the weirdest things in quotes even after you delete it from them?

Drives me up a wall!
    
CPUMotherboardGraphicsRAM
Intel i7 5820K AsRock Extreme6 X99 Gigabyte GTX 980 Ti Windforce OC 16 GB Corsair Vengeance LPX 
Hard DriveHard DriveCoolingOS
Samsung 840 EVO 250GB - HDD Speed Edtition Samsung SM951 512 GB - I still hate Samsung!  Noctua NHD14 Windows 10 
MonitorMonitorMonitorKeyboard
Achieva Shimian QH270-Lite Overlord Computer Tempest X27OC  Acer Predator XB270HU Filco Majestouch 2 Ninja 
PowerCaseMouseMouse Pad
Seasonic X-1250 Fractal Design R5 Razer Naga Razer Goliathus Alpha 
AudioAudio
AKG K702 65th Anniversary Edition Creative Sound Blaster Zx 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
Intel i7 5820K AsRock Extreme6 X99 Gigabyte GTX 980 Ti Windforce OC 16 GB Corsair Vengeance LPX 
Hard DriveHard DriveCoolingOS
Samsung 840 EVO 250GB - HDD Speed Edtition Samsung SM951 512 GB - I still hate Samsung!  Noctua NHD14 Windows 10 
MonitorMonitorMonitorKeyboard
Achieva Shimian QH270-Lite Overlord Computer Tempest X27OC  Acer Predator XB270HU Filco Majestouch 2 Ninja 
PowerCaseMouseMouse Pad
Seasonic X-1250 Fractal Design R5 Razer Naga Razer Goliathus Alpha 
AudioAudio
AKG K702 65th Anniversary Edition Creative Sound Blaster Zx 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Rumors and Unconfirmed Articles
Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [WCCF] AMD contracts TSMC to produce Zen @ 16nm instead of 14nm @ Globalfoundries (Updated)