Overclock.net › Forums › AMD › AMD CPUs › Steamroller?
New Posts  All Forums:Forum Nav:

Steamroller? - Page 66

post #651 of 3492
Then post the slides you are talking about. I've never seen any reference to AVX2 and/or TSX compatibility from AMD.
Quote:
Originally Posted by Seronx View Post

There is none it is a checklist once Steamroller comes out. The HotChips slides referenced AVX2 and TSX/ASF but nothing guaranteed. These instructions might or might not be in Steamroller or Excavator.
post #652 of 3492
Pardon my ignorance.. how much and why will the added mmx pipe contribute as I dont think that has been discussed
Wet Billy
(15 items)
 
PHENOM Phoenix
(13 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX 8350 @ 5.06GHz AsUS Sabertooth R2.0 xfx 280x DD Crucial Ballistex Tactical Tracer 1866Mhz 
Hard DriveOptical DriveCoolingCooling
Samsung 830 SSD 128GB LG BD-ROM/DVD Rewriter XSPC Raystorm RS240 kit A LOT OF FANS 
OSMonitorKeyboardPower
WIN 7 ultamate AOC 23" LED LCD Microsoft wireless laser 6000 Rosewill Capstone 750w 
CaseMouseMouse Pad
Coolermaster HAF 912 Microsoft wireless laser 6000 none 
CPUMotherboardGraphicsGraphics
phenom II x6 1100T MSI 990FXA-GD65 GTX 460 GTX 460 
RAMHard DriveOptical DriveCooling
Crucial Ballistix Tactical Tracer 8GB (2 x 4GB)... Samsung SSD 830  LG Blueray Coolermaster N520 
OSMonitorKeyboardPower
Win 7 Ultimate x64 AOC 1080p LED Microsoft OCZ 850w 
Case
Coolermaster HAF 912 
  hide details  
Reply
Wet Billy
(15 items)
 
PHENOM Phoenix
(13 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX 8350 @ 5.06GHz AsUS Sabertooth R2.0 xfx 280x DD Crucial Ballistex Tactical Tracer 1866Mhz 
Hard DriveOptical DriveCoolingCooling
Samsung 830 SSD 128GB LG BD-ROM/DVD Rewriter XSPC Raystorm RS240 kit A LOT OF FANS 
OSMonitorKeyboardPower
WIN 7 ultamate AOC 23" LED LCD Microsoft wireless laser 6000 Rosewill Capstone 750w 
CaseMouseMouse Pad
Coolermaster HAF 912 Microsoft wireless laser 6000 none 
CPUMotherboardGraphicsGraphics
phenom II x6 1100T MSI 990FXA-GD65 GTX 460 GTX 460 
RAMHard DriveOptical DriveCooling
Crucial Ballistix Tactical Tracer 8GB (2 x 4GB)... Samsung SSD 830  LG Blueray Coolermaster N520 
OSMonitorKeyboardPower
Win 7 Ultimate x64 AOC 1080p LED Microsoft OCZ 850w 
Case
Coolermaster HAF 912 
  hide details  
Reply
post #653 of 3492
One MMX pipe was removed, not added. One "MMX/FPSTO" pipe remains with an added shuffle unit. Now many instructions which were executed by that removed MMX pipe have been moved to the FMAC's or the "MMX/FPSTO" pipe.

It's unlikely that an application has some chaotic mix of MMX,SSE and AVX code so this was a good idea i guess.
Quote:
Originally Posted by F3ERS 2 ASH3S View Post

Pardon my ignorance.. how much and why will the added mmx pipe contribute as I dont think that has been discussed
post #654 of 3492
MMX isnt even really used anymore is it? I thought it was succeeded by SSE and SSE2? So ya having excess MMX pipes would be kinda wasteful.
Gaming
(17 items)
 
Gaming PC
(20 items)
 
 
CPUMotherboardGraphicsRAM
7700K AS Rock Z170 OC Formula Titan X Pascal 2050MHz 64GB DDR4-3200 14-14-14-34-1T 
Hard DriveHard DriveHard DriveCooling
950 EVO m.2 OS drive 850 EVO 1TB games drive Intel 730 series 500GB games drive Custom water cooling 
OSMonitorKeyboardPower
Win 10 Pro x64 AMH A399U E-Element mechanical, black switches, Vortex b... EVGA G3 1kw 
CaseMouseAudioAudio
Lian-Li PC-V1000L Redragon M901 LH Labs Pulse X Infinity DAC Custom built balanced tube amp with SS diamond ... 
Audio
MrSpeakers Alpha Prime 
  hide details  
Reply
Gaming
(17 items)
 
Gaming PC
(20 items)
 
 
CPUMotherboardGraphicsRAM
7700K AS Rock Z170 OC Formula Titan X Pascal 2050MHz 64GB DDR4-3200 14-14-14-34-1T 
Hard DriveHard DriveHard DriveCooling
950 EVO m.2 OS drive 850 EVO 1TB games drive Intel 730 series 500GB games drive Custom water cooling 
OSMonitorKeyboardPower
Win 10 Pro x64 AMH A399U E-Element mechanical, black switches, Vortex b... EVGA G3 1kw 
CaseMouseAudioAudio
Lian-Li PC-V1000L Redragon M901 LH Labs Pulse X Infinity DAC Custom built balanced tube amp with SS diamond ... 
Audio
MrSpeakers Alpha Prime 
  hide details  
Reply
post #655 of 3492
Quote:
Originally Posted by MrJava View Post

I reached out to someone at AMD to clarify details within the bdver3 GCC machine descriptor file. Here's some text from the email that was sent to me.
Also, if the FPSTO pipe hasn't changed between bulldozer and steamroller, then it can only do 1 128 bit store per cycle. This is still a bit of bottleneck for store heavy FP code.
Edit:
I can understand why 2 macro-ops per thread per cycle would be a good rate. Each core has two ALUs and two AGUs and the ALUs handle most types of instructions. So even in the worst case where instructions map to single macro-ops, each INT core can basically execute about 2 instructions per cycle, i.e. there is little point in feeding the INT cores more than 2 macro-ops per cycle.

Just wanted to repost with your edit as that clears up the confusion of why they opted to make it 2 cycles instead of one
Quote:
Originally Posted by MrJava View Post

One MMX pipe was removed, not added. One "MMX/FPSTO" pipe remains with an added shuffle unit. Now many instructions which were executed by that removed MMX pipe have been moved to the FMAC's or the "MMX/FPSTO" pipe.

It's unlikely that an application has some chaotic mix of MMX,SSE and AVX code so this was a good idea i guess.

thank you
Wet Billy
(15 items)
 
PHENOM Phoenix
(13 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX 8350 @ 5.06GHz AsUS Sabertooth R2.0 xfx 280x DD Crucial Ballistex Tactical Tracer 1866Mhz 
Hard DriveOptical DriveCoolingCooling
Samsung 830 SSD 128GB LG BD-ROM/DVD Rewriter XSPC Raystorm RS240 kit A LOT OF FANS 
OSMonitorKeyboardPower
WIN 7 ultamate AOC 23" LED LCD Microsoft wireless laser 6000 Rosewill Capstone 750w 
CaseMouseMouse Pad
Coolermaster HAF 912 Microsoft wireless laser 6000 none 
CPUMotherboardGraphicsGraphics
phenom II x6 1100T MSI 990FXA-GD65 GTX 460 GTX 460 
RAMHard DriveOptical DriveCooling
Crucial Ballistix Tactical Tracer 8GB (2 x 4GB)... Samsung SSD 830  LG Blueray Coolermaster N520 
OSMonitorKeyboardPower
Win 7 Ultimate x64 AOC 1080p LED Microsoft OCZ 850w 
Case
Coolermaster HAF 912 
  hide details  
Reply
Wet Billy
(15 items)
 
PHENOM Phoenix
(13 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX 8350 @ 5.06GHz AsUS Sabertooth R2.0 xfx 280x DD Crucial Ballistex Tactical Tracer 1866Mhz 
Hard DriveOptical DriveCoolingCooling
Samsung 830 SSD 128GB LG BD-ROM/DVD Rewriter XSPC Raystorm RS240 kit A LOT OF FANS 
OSMonitorKeyboardPower
WIN 7 ultamate AOC 23" LED LCD Microsoft wireless laser 6000 Rosewill Capstone 750w 
CaseMouseMouse Pad
Coolermaster HAF 912 Microsoft wireless laser 6000 none 
CPUMotherboardGraphicsGraphics
phenom II x6 1100T MSI 990FXA-GD65 GTX 460 GTX 460 
RAMHard DriveOptical DriveCooling
Crucial Ballistix Tactical Tracer 8GB (2 x 4GB)... Samsung SSD 830  LG Blueray Coolermaster N520 
OSMonitorKeyboardPower
Win 7 Ultimate x64 AOC 1080p LED Microsoft OCZ 850w 
Case
Coolermaster HAF 912 
  hide details  
Reply
post #656 of 3492
Quote:
Originally Posted by EniGma1987 View Post

MMX isnt even really used anymore is it? I thought it was succeeded by SSE and SSE2? So ya having excess MMX pipes would be kinda wasteful.

I think this is the best answer


http://stackoverflow.com/questions/12938612/how-to-use-mmx-in-parallel-with-sse-operations
Wet Billy
(15 items)
 
PHENOM Phoenix
(13 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX 8350 @ 5.06GHz AsUS Sabertooth R2.0 xfx 280x DD Crucial Ballistex Tactical Tracer 1866Mhz 
Hard DriveOptical DriveCoolingCooling
Samsung 830 SSD 128GB LG BD-ROM/DVD Rewriter XSPC Raystorm RS240 kit A LOT OF FANS 
OSMonitorKeyboardPower
WIN 7 ultamate AOC 23" LED LCD Microsoft wireless laser 6000 Rosewill Capstone 750w 
CaseMouseMouse Pad
Coolermaster HAF 912 Microsoft wireless laser 6000 none 
CPUMotherboardGraphicsGraphics
phenom II x6 1100T MSI 990FXA-GD65 GTX 460 GTX 460 
RAMHard DriveOptical DriveCooling
Crucial Ballistix Tactical Tracer 8GB (2 x 4GB)... Samsung SSD 830  LG Blueray Coolermaster N520 
OSMonitorKeyboardPower
Win 7 Ultimate x64 AOC 1080p LED Microsoft OCZ 850w 
Case
Coolermaster HAF 912 
  hide details  
Reply
Wet Billy
(15 items)
 
PHENOM Phoenix
(13 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX 8350 @ 5.06GHz AsUS Sabertooth R2.0 xfx 280x DD Crucial Ballistex Tactical Tracer 1866Mhz 
Hard DriveOptical DriveCoolingCooling
Samsung 830 SSD 128GB LG BD-ROM/DVD Rewriter XSPC Raystorm RS240 kit A LOT OF FANS 
OSMonitorKeyboardPower
WIN 7 ultamate AOC 23" LED LCD Microsoft wireless laser 6000 Rosewill Capstone 750w 
CaseMouseMouse Pad
Coolermaster HAF 912 Microsoft wireless laser 6000 none 
CPUMotherboardGraphicsGraphics
phenom II x6 1100T MSI 990FXA-GD65 GTX 460 GTX 460 
RAMHard DriveOptical DriveCooling
Crucial Ballistix Tactical Tracer 8GB (2 x 4GB)... Samsung SSD 830  LG Blueray Coolermaster N520 
OSMonitorKeyboardPower
Win 7 Ultimate x64 AOC 1080p LED Microsoft OCZ 850w 
Case
Coolermaster HAF 912 
  hide details  
Reply
post #657 of 3492
Quote:
Originally Posted by EniGma1987 View Post

MMX isnt even really used anymore is it? I thought it was succeeded by SSE and SSE2? So ya having excess MMX pipes would be kinda wasteful.
The 256-bit MMX unit in Bulldozer does Integer SSE/AVX/etc. The MMX unit in Steamroller only does Compares, Shuffles, Converts, Stores, etc.

In Bulldozer, the MMX unit is the equivalent of the FADD(Integer) and FSTO unit in Stars.

Also
__fixing stuff__
Quote:
Originally Posted by Seronx View Post

So, you can only retire 2 micro-ops per core per cycle. The FPU is used by both cores so it can retire 4 micro-ops. So, AMD's retirement rate is half that of Intel for the GP x86 part and FPU x86 Accel part. If Steamroller, is only a decode enhancement you will still be behind 60% in some benchmarks.
Quote:
Originally Posted by Seronx View Post

Real world performance falls inline with 1 macro-op per core being run as well.
Bulldozer Dispatch/Retire => 1 macro-op group per core => 4/more* macro-ops in total per core. 8/more* micro-ops in total per core.
Steamroller Dispatch/Retire => 2 macro-op groups per core => 8/more* macro-ops in total per core. 16/more* micro-ops in total per core.

*Macro-op fusion and Micro-op fusion. Missed the word "group" on the commentary. Sorry, dudes.
Edited by Seronx - 9/3/13 at 1:37pm
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
Reply
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
Reply
post #658 of 3492
A steamroller INT core can still only execute 2 instructions per cycle max and 4 macro ops are dispatched every two cycles. Being able to retire more than 4 macro-ops per cycle is overkill.

Maybe the AGUs are more capable in steamroller, but the family 15h SOG indicates that most instructions are executed in EX0/EX1 including MOV's.
Quote:
Originally Posted by Seronx View Post

The 256-bit MMX unit in Bulldozer does Integer SSE/AVX/etc. The MMX unit in Steamroller only does Compares, Shuffles, Converts, Stores, etc.

In Bulldozer, the MMX unit is the equivalent of the FADD(Integer) and FSTO unit in Stars.

Also
__fixing stuff__

Bulldozer Dispatch/Retire => 1 macro-op group per core => 4/more* macro-ops in total per core. 8/more* micro-ops in total per core.
Steamroller Dispatch/Retire => 2 macro-op groups per core => 8/more* macro-ops in total per core. 16/more* micro-ops in total per core.

*Macro-op fusion and Micro-op fusion. Missed the word "group" on the commentary. Sorry, dudes.
post #659 of 3492
Quote:
Originally Posted by Seronx View Post

The GP x86 cores and the FPU x86 accelerator executes micro-ops. You retire and dispatch macro-ops which gets converted into micro-ops by the scheduler. Each core only has one macro-op dispatch and retire engine in Bulldozer. So, you can only retire 2 micro-ops per core per cycle. The FPU is used by both cores so it can retire 4 micro-ops.

Intel can retire 2 fused micro-ops per thread if two threads are running, 4 fused micro-ops per thread if only one thread is running. Intel has fused micro-ops which is basically the equivalent of macro-ops in comparison to AMD. So, AMD's retirement rate is half that of Intel for the GP x86 part and FPU x86 Accel part. If Steamroller, is only a decode enhancement you will still be behind 60% in some benchmarks.

This is also why superPI is slow.
not just because it's x87 or floating point.
    
CPUMotherboardGraphicsRAM
amd Phenom II x6 1090T gigabye UD7 990FX 5870 G.skill flare 2 x 4gbs 2000mhz  
Hard DriveCoolingOSMonitor
westerdigital cooler master eisberg 240L Vista 64 bit spceptre 1920 x 1200 
KeyboardPowerCaseMouse
muli-media ftw lol 1200 watt silverstone none another cheap one $20 
Mouse PadOther
none ATi 650 pro theater  
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
amd Phenom II x6 1090T gigabye UD7 990FX 5870 G.skill flare 2 x 4gbs 2000mhz  
Hard DriveCoolingOSMonitor
westerdigital cooler master eisberg 240L Vista 64 bit spceptre 1920 x 1200 
KeyboardPowerCaseMouse
muli-media ftw lol 1200 watt silverstone none another cheap one $20 
Mouse PadOther
none ATi 650 pro theater  
  hide details  
Reply
post #660 of 3492
Quote:
Originally Posted by Demonkev666 View Post

This is also why superPI is slow.
not just because it's x87 or floating point.

SuperPi is also slow because of a "bug" or assumed bug in the CPU logic. For some reason x87 code gets executed in a very slow manner. You can patch in a code update to the CPU that changes how x87 code is handled and speeds up execution by ~20% without costing anything in either performance or power consumption in anything else.
http://www.xtremesystems.org/forums/showthread.php?286448-The-Book-of-Bulldozer-Revelations-Episode-2-%28SuperPI-x87%29&p=5195197&viewfull=1#post5195197
http://www.xtremesystems.org/forums/showthread.php?286448-The-Book-of-Bulldozer-Revelations-Episode-2-%28SuperPI-x87%29&p=5196111&viewfull=1#post5196111
Gaming
(17 items)
 
Gaming PC
(20 items)
 
 
CPUMotherboardGraphicsRAM
7700K AS Rock Z170 OC Formula Titan X Pascal 2050MHz 64GB DDR4-3200 14-14-14-34-1T 
Hard DriveHard DriveHard DriveCooling
950 EVO m.2 OS drive 850 EVO 1TB games drive Intel 730 series 500GB games drive Custom water cooling 
OSMonitorKeyboardPower
Win 10 Pro x64 AMH A399U E-Element mechanical, black switches, Vortex b... EVGA G3 1kw 
CaseMouseAudioAudio
Lian-Li PC-V1000L Redragon M901 LH Labs Pulse X Infinity DAC Custom built balanced tube amp with SS diamond ... 
Audio
MrSpeakers Alpha Prime 
  hide details  
Reply
Gaming
(17 items)
 
Gaming PC
(20 items)
 
 
CPUMotherboardGraphicsRAM
7700K AS Rock Z170 OC Formula Titan X Pascal 2050MHz 64GB DDR4-3200 14-14-14-34-1T 
Hard DriveHard DriveHard DriveCooling
950 EVO m.2 OS drive 850 EVO 1TB games drive Intel 730 series 500GB games drive Custom water cooling 
OSMonitorKeyboardPower
Win 10 Pro x64 AMH A399U E-Element mechanical, black switches, Vortex b... EVGA G3 1kw 
CaseMouseAudioAudio
Lian-Li PC-V1000L Redragon M901 LH Labs Pulse X Infinity DAC Custom built balanced tube amp with SS diamond ... 
Audio
MrSpeakers Alpha Prime 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: AMD CPUs
Overclock.net › Forums › AMD › AMD CPUs › Steamroller?