Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [OBR] Exclusive: APU A8 "Trinity" tested - Strong GPU, crappy CPU!
New Posts  All Forums:Forum Nav:

[OBR] Exclusive: APU A8 "Trinity" tested - Strong GPU, crappy CPU! - Page 8  

post #71 of 90
Quote:
Originally Posted by hajile View Post

I don't think that the numbers presented in the article are correct. AMD said a 50% GPU increase and a 25% CPU increase. The numbers he shows don't seem to match either of those in any way.

That is for Mobile

CPU Frequency is 25% higher and GPU frequency is 50% higher

Llano 4C GFlops = Trinity 4C GFlops
if the frequency is the same and that is how they do the math

1.5GHz = 35W Llano -> 17W Trinity
1.5 x 1.25 => 1.9GHz 35W Trinity

A8-3550MX is the highest-end Llano mobile
2.0GHz x 1.25 => 2.5GHz 45W Trinity

^-- or so about

Desktop: CPU 20% Faster/ GPU 30% Faster

3.0GHz 100W x 1.20 => 3.6GHz Trinity 100W

But there will be models of Trinity with the GPU disabled that will be 3.8+GHz or with the GPU with a lower clock or so maybe thumb.gif
Quote:
Originally Posted by hajile View Post

Only 3 registers are used (as one number comes straight from memory) and only three instructions are necessary. I haven't checked (I don't do much with x86 asm) but one needs to know the cycles per instruction to know the total time of one vs the other. If the ALU instructions all execute at the same rate, then the increase in speed is huge for the second method as MOV instructions are very expensive for floats.

I could see a 2x speedup with this kind of coding, but if 128-bit floats were used, AMD's split float units could double that for a total of 4x speedup overall.

Interesting...
I don't know about coding but I know some of the effects of certain ISAs

I believe the ALUs executes at the time of the MUL portion as it is the slowest part of FMA

The length to complete a full set
FADD the shortest time to execute
FMUL the longest time to execute
FMA slightly longer than FMUL but shorter than to execute FADD+FMUL

3XXms -> 38Xms -> 4X0ms <-- off the top of my head

^The biggest variables(I don't know what benchmark they used or what White Paper and I don't know where I saw this)

Edit: I keep re-reading and seeing mistakes UGH!!!
Edited by Seronx - 1/23/12 at 8:23pm
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
post #72 of 90
Quote:
Originally Posted by Seronx View Post

That is for Mobile
CPU Frequency is 25% higher and GPU frequency is 50% higher
Llano 4C GFlops = Trinity 4C GFlops
if the frequency is the same and that is how they do the math
1.5GHz = 35W Llano -> 17W Trinity
1.5 x 1.25 => 1.9GHz 35W Trinity
A8-3550MX is the highest-end Llano mobile
2.0GHz x 1.25 => 2.5GHz 45W Trinity
^-- or so about
Desktop: CPU 20% Faster/ GPU 30% Faster
3.0GHz 100W x 1.20 => 3.6GHz Trinity 100W
But there will be models of Trinity with the GPU disabled that will be 3.8+GHz or with the GPU with a lower clock or so maybe thumb.gif
Interesting...
I don't know about coding but I know some of the effects of certain ISAs
I believe the ALU executes at the time of the MUL portion as it is the slowest part of FMA

I misunderstood.

Even so (and despite my sometimes bumbling brain), this still shows huge progress. A 4-module bulldozer barely kept up with the six-core thuban, but here we have a 2-module piledriver keeping up with a quad-core Llanos. No matter how you cut it, there has been quite a lot of improvement somewhere.

I guess that AMD planning to launch a Piledriver chip for servers shows that AMD knew that a bulldozer-based server chip couldn't compete (thus no chips), but AMD expects piledriver to be able to compete, so they are finally making server chips.

All that's left is for AMD to get rid of it's x86 float units and instead decode the x86 float instructions and send them to a GCN on-die GPU for processing.

edit: about the ALUs

You're right when you say that multiplication is expensive (it is extremely expensive compared to addition/subtraction though still cheaper than the super-expensive division operations). If FMA4 executes its FMUL at the same rate as a regular FMUL, then FMA will still take more clocks as it has to do a FADD first. The savings most likely come from not having to move the results into a register only to pull them out again a cycle later, having less code (the whole CISC thing), bypassing a register (for the third number), and obtaining a result that is more accurate due to less rounding (less rounding also saves time).
Edited by hajile - 1/23/12 at 8:33pm
post #73 of 90
Quote:
Originally Posted by hajile View Post

All that's left is for AMD to get rid of it's x86 float units and instead decode the x86 float instructions and send them to a GCN on-die GPU for processing.

HSAIL(Use to be called FSAIL) Version 1.0

You can expect it in 2014

385

or maybe it is after 2015....but a lot of things can be offloaded to GCN(The same can be said to Fermi and Kepler aswell)

2012 = CUDA/OpenCL(HSA) and C++ AMP with Windows 8
(If you notice CUDA is a lot farther ahead than AMD HSA is)

I am prepped!
Edited by Seronx - 1/23/12 at 8:32pm
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
post #74 of 90
Quote:
Originally Posted by Seronx View Post

HSAIL(Use to be called FSAIL) Version 1.0
You can expect it in 2014
385
or maybe it is after 2015....but a lot of things can be offloaded to GCN(The same can be said to Fermi and Kepler aswell)
2012 = CUDA/OpenCL(HSA) and C++ AMP with Windows 8
(If you notice CUDA is a lot farther ahead than AMD HSA is)
I am prepped!

This isn't what I was talking about (if I read the chart correctly). This chart shows that programs must still be specifically coded to take use of the GPU and any GPU can be used. In my scenario, everything is compiled to x86 code. When the code hits the processor, it is divided into four types: serial integer, parallel integer, serial float, parallel float.

Let's back up a little first. x86 CISC instructions are dealt with in a very RISC-like manner (since the late 90s) via breaking them down. They are first broken down into macro-code which contains two or three closely related instruction and then are broken down into the individual instructions (micro-code) shortly before execution. The idea is that the decode unit breaks down the instructions and then intelligently sends all the float microcode to the GPU. As you can immediately see, only a tightly integrated (read: homogeneous) GPU of a very specific type could interpret this microcode, so non-AMD processors wouldn't work for this though they could still execute standard GPU code via drivers (note: in this case, the integrated GPU wouldn't require drivers).

Traditionally, only parallel float is done on a GPU. This is because a GPU isn't very smart about dealing with complex code (this is what Fermi and GCN attempt to remedy). With a smart CPU doing the complex work, all the GPU needs to do is what it does best (crunch numbers quickly).

Serial integer computing would be handled by the CPU. Parallel integer would probably be handled by the CPU as well (though a GPU could handle it). For serialized floats, the CPU front-end would do all the hard work while the GPU does the grunt-work. For parallel floats, the CPU would just decode and then shoot the instructions off to the GPU to handle.
Edited by hajile - 1/23/12 at 9:03pm
post #75 of 90
536

What about this?
(Note: This has been renamed to Heterogeneous Systems Architecture Intermediate Layer)
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
post #76 of 90
Quote:
HD 7770 is supposed to be close to GTX 460 performance

No.
My System
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom 9750 (stock) MSI MS-7548 (Aspen) HD 6950 @ 971/1387 1.25v 8GB DDR2 
Hard DriveOSMonitorPower
750GB Windows 7 64-bit ASUS VH238H 1920x1080 Seasonic X-650 Gold 
CaseMouseMouse Pad
Rosewill Smart One Razer Naga Razer Scarab 
  hide details  
My System
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom 9750 (stock) MSI MS-7548 (Aspen) HD 6950 @ 971/1387 1.25v 8GB DDR2 
Hard DriveOSMonitorPower
750GB Windows 7 64-bit ASUS VH238H 1920x1080 Seasonic X-650 Gold 
CaseMouseMouse Pad
Rosewill Smart One Razer Naga Razer Scarab 
  hide details  
post #77 of 90
Quote:
Originally Posted by Homeles View Post

No.

HD7770 is 896 SPs of GCN

896 x 2 x .9GHz = 1612.8 GFlops

460 GTX = 907.2GFlops

So, you are correct in "no."
Edited by Seronx - 1/23/12 at 10:39pm
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
AMD FX ~Seronx
(16 items)
 
  
CPUMotherboardGraphicsRAM
FX-9800P Acer Wasp R7 M440 SK Hynix HMA41GS6AFR8N-TF 
Hard DriveHard DriveOptical DriveCooling
KINGSTON RBU-SNS8152S3128GG2 TOSHIBA MQ01ABD100 HL-DT-ST DVDRAM GUE1N Stock 
OSMonitorKeyboardPower
Microsoft Windows 10 Home Build 14393 Viewsonic XG2401 24 Hz-144 Hz Ducky Channel Shine 3 Stock 65W 
CaseMouseMouse PadAudio
Acer Exoskeleton Steelseries Rival 300 Razer Megasoma AMD-Realtek ALC255 
  hide details  
post #78 of 90
Hm.... maybe A8 Trinity is close to GTX 460! Hard to say. I'm realistically expecting it to be 5770 speeds, but hey, faster and it would be great to Crossfire with a 7770. And if it really was that fast... $600 budget system with 6950 performance?!?

EDIT: But I'm pretty sure everything besides 7900 series uses VLIW4 (whichever one is in 6900 series currently, I get 4 and 5 confused).
 
Lanbox Lite
(16 items)
 
 
CPUGraphicsRAMHard Drive
i3-2310M Intel HD 3000 8GB DDR3 Samsung 830 64GB SSD 
Hard DriveOSMonitorCase
640GB Hitachi HDD Windows 7 13.3" LCD Magnesium Alloy, 3.2lbs 
CPUMotherboardGraphicsRAM
Phenom II X4 955 BE Gigabyte GA-MA78GPM-UD2H MSI Hawk R5770 3x2GB G.Skill DDR2 800 4-4-4-12 
Hard DriveHard DriveOptical DriveCooling
1TB Samsung Spinpoint F3 250 GB WD Caviar Black Samsung 20X DVD-R/RW Thermaltake MaxOrb 
CoolingOSMonitorPower
Noctua NF-B9-1600 Windows 7 Pro 64-bit BenQ E2420HD, 24" 1920x1080 TT Purepower 500W 
Case
TT Lanbox Lite 
  hide details  
 
Lanbox Lite
(16 items)
 
 
CPUGraphicsRAMHard Drive
i3-2310M Intel HD 3000 8GB DDR3 Samsung 830 64GB SSD 
Hard DriveOSMonitorCase
640GB Hitachi HDD Windows 7 13.3" LCD Magnesium Alloy, 3.2lbs 
CPUMotherboardGraphicsRAM
Phenom II X4 955 BE Gigabyte GA-MA78GPM-UD2H MSI Hawk R5770 3x2GB G.Skill DDR2 800 4-4-4-12 
Hard DriveHard DriveOptical DriveCooling
1TB Samsung Spinpoint F3 250 GB WD Caviar Black Samsung 20X DVD-R/RW Thermaltake MaxOrb 
CoolingOSMonitorPower
Noctua NF-B9-1600 Windows 7 Pro 64-bit BenQ E2420HD, 24" 1920x1080 TT Purepower 500W 
Case
TT Lanbox Lite 
  hide details  
post #79 of 90
would be neat that it would be 460 or 6870 performance or maybe the 6930, i could probably save alittle money biggrin.gif
JunkoXan's Build
(15 items)
 
   
CPUMotherboardGraphicsRAM
Intel 2700k (Engineering Sample) Asus P67 Sabertooth Sapphire Dual-X 280x G.Skill Sniper (2x4gb) 
Hard DriveCoolingOSMonitor
Samsung F3 Spinpoint Cooler Masters 616+ Windows 7 Upstar 20" 1080p TV 
MonitorKeyboardPowerCase
Dell Monitor 21" 1680x1050 Dell Keyboard Antec High Current Gamer 900W Cooler Master HAF-XB 
MouseMouse Pad
E-3lue Mazer II 2600DPI Optical Mouse Custom Made 
  hide details  
JunkoXan's Build
(15 items)
 
   
CPUMotherboardGraphicsRAM
Intel 2700k (Engineering Sample) Asus P67 Sabertooth Sapphire Dual-X 280x G.Skill Sniper (2x4gb) 
Hard DriveCoolingOSMonitor
Samsung F3 Spinpoint Cooler Masters 616+ Windows 7 Upstar 20" 1080p TV 
MonitorKeyboardPowerCase
Dell Monitor 21" 1680x1050 Dell Keyboard Antec High Current Gamer 900W Cooler Master HAF-XB 
MouseMouse Pad
E-3lue Mazer II 2600DPI Optical Mouse Custom Made 
  hide details  
post #80 of 90
GFlops is hardly an accurate measurement of GPU performance.
Misaka
(18 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i5-3570K ASRock Z68 Extreme4 Gen3 Sapphire HD 7850 2 GB Samsung DDR3 16 GB (30 nm) 
Hard DriveHard DriveOptical DriveCooling
Crucial M4 128 GB WD Caviar Blue 1 TB Lite-on DVD Burner Thermalright Venomous X 
OSOSMonitorKeyboard
Windows 7 Professional x64 (Host) Crunchbang Linux x64 (Guest) HP 2311x HP PS/2 Keyboard 
PowerCaseMouseAudio
Rosewill Capstone 450 W Rosewill Challenger Logitech M570 ASUS Xonar D1 
Other
Hauppauge HVR-1250 
  hide details  
Misaka
(18 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i5-3570K ASRock Z68 Extreme4 Gen3 Sapphire HD 7850 2 GB Samsung DDR3 16 GB (30 nm) 
Hard DriveHard DriveOptical DriveCooling
Crucial M4 128 GB WD Caviar Blue 1 TB Lite-on DVD Burner Thermalright Venomous X 
OSOSMonitorKeyboard
Windows 7 Professional x64 (Host) Crunchbang Linux x64 (Guest) HP 2311x HP PS/2 Keyboard 
PowerCaseMouseAudio
Rosewill Capstone 450 W Rosewill Challenger Logitech M570 ASUS Xonar D1 
Other
Hauppauge HVR-1250 
  hide details  
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Rumors and Unconfirmed Articles
This thread is locked  
Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [OBR] Exclusive: APU A8 "Trinity" tested - Strong GPU, crappy CPU!