Overclock.net banner

41 - 60 of 61 Posts

·
Registered
Joined
·
1,198 Posts
I have a feeling that it will have HBM2 not GDDR6 to achieve 16gb > 3080 10GB marketing side and to have the bandwith the card deserves..... the infinite cache thing seems a fury vram management on steroids so -> HBM again
 

·
Registered
Joined
·
386 Posts
It had nothing to do with being based on a MCM design. It had to do with the mounting pressure and overvolting. Even reference RDNA cards had similar thermal struggles which people also resorted to increasing the mounting pressure using washers just like with the Radeon VII cooler.
Apologies, I was thinking of Vega. They were the ones where the GPU and the HBM chips were of differing heights, thus requiring AMD to use two different thickness thermal interface materials to ensure the whole MCM chip was making contact with the cooler.

When Asus followed AMD's mounting pressure guidelines their temperatures were awful on the RX 5700 series cards.
So?

All that shows is Asus brought a product to market that was NOT tested adequately.

Sure, AMD's 'guidelines' may have been inadequate, but that's superstition. It also doesn't explain why other GPU manufacturers had more then adequately cooled 5700 cards at launch.

Other manufacturers managed to release 5700 cards during the launch window - within 1 month - with no cooling issues related to 'mounting pressure'. MSI had that crappy card with terrible/non-existent memory cooling, but we also had good cards like the Dragon and Pulse.

That one is squarely on Asus.


Seems like people are missing that Nvidia more than doubled the amount of cores on the RTX 3090 compared to the Titan RTX(320w TGP)and only used 30w more.
No they didn't. They changed how a 'core' is calculated.....

I'm honestly not sure if you're trolling now....
 

·
sudo apt install sl
Joined
·
7,305 Posts
Apologies, I was thinking of Vega. They were the ones where the GPU and the HBM chips were of differing heights, thus requiring AMD to use two different thickness thermal interface materials to ensure the whole MCM chip was making contact with the cooler.



So?

All that shows is Asus brought a product to market that was NOT tested adequately.

Sure, AMD's 'guidelines' may have been inadequate, but that's superstition. It also doesn't explain why other GPU manufacturers had more then adequately cooled 5700 cards at launch.

Other manufacturers managed to release 5700 cards during the launch window - within 1 month - with no cooling issues related to 'mounting pressure'. MSI had that crappy card with terrible/non-existent memory cooling, but we also had good cards like the Dragon and Pulse.

That one is squarely on Asus.




No they didn't. They changed how a 'core' is calculated.....

I'm honestly not sure if you're trolling now....
You completely ignored the washer mod and the high amount of voltage AMD adds then call me a troll? No they didn't change how a core is calculated. Nvidia added additional FP32 cores.
2459091
2459092


Edit: from the GA102 white paper before more people start spreading misinformation.

2x FP32 Throughput

In the Turing generation, each of the four SM processing blocks (also called partitions) had two
primary datapaths, but only one of the two could process FP32 operations. The other datapath
was limited to integer operations. GA10X includes FP32 processing on both datapaths, doubling
the peak processing rate for FP32 operations. One datapath in each partition consists of 16

FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath
consists of both 16 FP32 CUDA Cores
and 16 INT32 Cores, and is capable of executing either
16 FP32 operations OR 16 INT32 operations per clock. As a result of this new design, each
GA10x SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32
and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32
operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32
operations per clock.
 

·
Premium Member
Joined
·
10,765 Posts
Core is an ambiguous internal term that is a waste of time to argue about.
 

·
sudo apt install sl
Joined
·
7,305 Posts
Core is an ambiguous internal term that is a waste of time to argue about.
CUDA cores(FP32 cores) have had the same definition since Fermi. There is nothing to argue. Nvidia doubled the amount of CUDA cores in GA102. Each SM now has 128 CUDA cores compared to 32 in Fermi.
 

·
Premium Member
Joined
·
10,765 Posts
CUDA cores(FP32 cores) have had the same definition since Fermi. There is nothing to argue. Nvidia doubled the amount of CUDA cores in GA102. Each SM now has 128 CUDA cores compared to 32 in Fermi.

WELL if thats not a Standard then I don't know what one is!
 

·
WaterCooler
Joined
·
3,445 Posts
CUDA cores(FP32 cores) have had the same definition since Fermi. There is nothing to argue. Nvidia doubled the amount of CUDA cores in GA102. Each SM now has 128 CUDA cores compared to 32 in Fermi.
Sure but since the architecture has changed, you can't 1 for 1 compare a Turing core to an Ampere core and extrapolate performance that way. Compute maybe. But doubling of CUDA cores in Ampere doesn't double gaming performance over Turing since the core design is fundamentally different.

Fermi to Kepler is another great example of this. Kepler got a lot more CUDA cores but they worked differently from a Fermi CUDA core.
 

·
Registered
Joined
·
846 Posts
Fermi bloody melted its cores if you let it.
 
  • Rep+
Reactions: Hueristic

·
sudo apt install sl
Joined
·
7,305 Posts
Sure but since the architecture has changed, you can't 1 for 1 compare a Turing core to an Ampere core and extrapolate performance that way. Compute maybe. But doubling of CUDA cores in Ampere doesn't double gaming performance over Turing since the core design is fundamentally different.

Fermi to Kepler is another great example of this. Kepler got a lot more CUDA cores but they worked differently from a Fermi CUDA core.
I never stated it doubles the amount of gaming performance. My argument was that the power went up due to doubling the amount of CUDA cores. Where someone stated that Nvidia is just calculating the cores differently which isn't true.

The SM has many new changes but the CUDA cores are very similar between Turing and Ampere aside from the increased IPC and reduced power consumption of each core. We can see the reduced power consumption due to them being able to fit more than double the amount of Titan RTX cores in the 3090 while only using 30w more.

When looking at the SM they added a new datapath which allows developers to either use the INT 32 cores or the new FP32 CUDA cores. We'll have to wait for developers to start utilizing the new FP32 datapath before we see major gains in other titles.
 

·
WaterCooler
Joined
·
3,445 Posts
I never stated it doubles the amount of gaming performance. My argument was that the power went up due to doubling the amount of CUDA cores. Where someone stated that Nvidia is just calculating the cores differently which isn't true.
Sorry didn't mean to imply that you stated that. Just something I have seen floating around on the internet.

Are we confirmed that this is actually a doubling of CUDA cores. During the announcement, Jensen did state a doubling of instructions per clock cycle over Turing. Is it really just double the cores?
 

·
sudo apt install sl
Joined
·
7,305 Posts
Sorry didn't mean to imply that you stated that. Just something I have seen floating around on the internet.

Are we confirmed that this is actually a doubling of CUDA cores. During the announcement, Jensen did state a doubling of instructions per clock cycle over Turing. Is it really just double the cores?
Their GA102 white paper indicates they added an additional 64 CUDA cores per SM. When comparing GA100 against GA102 we don't see the additional CUDA cores.

GA100:
64 FP32 CUDA Cores/SM, 8192 FP32 CUDA Cores per full GPU
GA10X
Each SM in GA10x GPUs contain 128 CUDA Cores, four third-generation Tensor Cores, a 256
KB Register File, four Texture Units, one second-generation Ray Tracing Core, and 128 KB of
L1/Shared Memory, which can be configured for differing capacities depending on the needs of
the compute or graphics workloads
2459121
2459122
 

·
WaterCooler
Joined
·
3,445 Posts

·
Old and Crochity
Joined
·
5,244 Posts
Now we wait for AMD, since all Nvidia showed is that if you push more power to their architecture you get more performance. :) If AMD comes even close to the 3080 with a 300W or less TDP I'm in like sin.
 

·
WaterCooler
Joined
·
3,445 Posts
Now we wait for AMD, since all Nvidia showed is that if you push more power to their architecture you get more performance. :) If AMD comes even close to the 3080 with a 300W or less TDP I'm in like sin.
Leaning this way myself, but will also be curious to see the AMD software stack/drivers.
 

·
Premium Member
Joined
·
10,765 Posts
When looking at the SM they added a new datapath which allows developers to either use the INT 32 cores or the new FP32 CUDA cores. We'll have to wait for developers to start utilizing the new FP32 datapath before we see major gains in other titles.
This is completely wrong, NVIDIA added fp32 to int32 cores( or whatever they are calling them today) because there was only a 33 percent usage on the int32 so they wanted to increase the throughput of those cores so in effect they can now process fp32 when not doing int32. If as you say developers need to increase fp32 (which is ridiculous on many levels) then the bottleneck would move again to saturating those 2 types of cores and they would have to add yet m0ar cores. The addition of fp32 to Int32 cores is a optimization move to balance the gpu into using the 64bit access more frequently.

Which seems to be a smart move.
 

·
sudo apt install sl
Joined
·
7,305 Posts
This is completely wrong, NVIDIA added fp32 to int32 cores( or whatever they are calling them today) because there was only a 33 percent usage on the int32 so they wanted to increase the throughput of those cores so in effect they can now process fp32 when not doing int32. If as you say developers need to increase fp32 (which is ridiculous on many levels) then the bottleneck would move again to saturating those 2 types of cores and they would have to add yet m0ar cores. The addition of fp32 to Int32 cores is a optimization move to balance the gpu into using the 64bit access more frequently.

Which seems to be a smart move.
How about you read the white paper before calling someone wrong. I placed a key word for you in bold.

NVIDIA Ampere GA102 GPU Architecture

Modern gaming workloads have a wide range of processing needs. Many workloads have a mix of FP32 arithmetic instructions (such as FFMA, floating point additions (FADD), or floating-point multiplications (FMUL)), along with many simpler integer instructions such as adds for addressing and fetching data, floating point compare, or min/max for processing results, etc. Turing introduced a second math datapath to the SM, which provided significant performance benefits for these types of workloads. However, other workloads can be dominated by floating point instructions. Adding floating point capability to the second datapath will significantly help these workloads. Performance gains will vary at the shader and application level depending on the mix of instructions. Ray tracing denoising shaders are a good example of a workload that can benefit greatly from doubling FP32 throughput.

All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.
 

·
Premium Member
Joined
·
10,765 Posts

·
sudo apt install sl
Joined
·
7,305 Posts
If as you say developers need to increase fp32 (which is ridiculous on many levels)
Yeah so your making my point.
How am I making your point? You said it was ridiculous but the entire reason was due to developers increasing FP32 in their games.

However, other workloads can be dominated by floating point instructions. Adding floating point capability to the second datapath will significantly help these workloads. Performance gains will vary at the shader and application level depending on the mix of instructions. Ray tracing denoising shaders are a good example of a workload that can benefit greatly from doubling FP32 throughput.
 

·
Premium Member
Joined
·
10,765 Posts
How am I making your point? You said it was ridiculous but the entire reason was due to developers increasing FP32 in their games.
I have no time for you to try to wriggle out of your incorrect statements today.

Maybe someone with more patience will explain to those that don't know you are just trying to move goal posts again when called out on your errors.
 

·
sudo apt install sl
Joined
·
7,305 Posts
I have no time for you to try to wriggle out of your incorrect statements today.

Maybe someone with more patience will explain to those that don't know you are just trying to move goal posts again when called out on your errors.

I didn't move any goal post, you just won't give a direct answer because you were wrong. I had to pin point something I quoted earlier because you obviously didn't read it.
 
41 - 60 of 61 Posts
Top