Overclock.net › Forums › Graphics Cards › NVIDIA › [RESULTS] Is that 8800 GTX really as bottlenecked as we thought?
New Posts  All Forums:Forum Nav:

[RESULTS] Is that 8800 GTX really as bottlenecked as we thought? - Page 3

post #21 of 27
Thread Starter 
Quote:
Originally Posted by jrstang473 View Post
Like I said I dont get.My results are way diffrent .Maybe the bottle neck is so big you just wont see an improvement till you get way up there.
The difference is the resolution at which the tests were ran, and the amount of AA/AF used, you ran the tests at the stock settings for 3D MARK '06, which are 1280*1024, 0AA, 0AF, I ran the tests at 1920*1200, 8AA, 16xAF. Higher resolutions, and higher AA/AF make the video card do more work, thus decreasing the visible bottleneck.

PS. For further explanation Read Post 1...
post #22 of 27
Very interesting results

I'm really surprised to hear this and in some ways, quite relieved as this is attainable to us overclockers.

It's not all plain sailing though, it does mean you need to have at least bought a C2D, and better still, overclocked it to get your money's worth from an SLI 8800GTX setup. A lot of people are still on single core athlon's, and the previous generation dual cores running on low to medium resolution TFT's and will still be bottlenecked by them.

Good response from the thread as well although i feel I should correct Manual again...sorry...

Quote:
Originally Posted by The_Manual View Post
A bottleneck will not always be present.
By it's very definition, exactly the opposite holds true. There is always a bottleneck in a computer. A bottleneck is where the slowest part of the system stops another (faster) part from operating to its potential.

There will always be a piece of the system that is slower than the rest (it might still be blindingly fast compared to other systems, but it is still called the bottleneck of that system).

E.g. in the cpu system, the cache is the bottleneck as it is slower to access than the registers. Conversely, out of the cpu and ram system, the ram is the bottleneck as it is slower to access than the cache and registers. And again, out of the cpu, ram and hdd system, the hard drive is the bottleneck as it is again slower than the ram, cache and registers.
Desktop
(13 items)
 
  
CPUMotherboardGraphicsRAM
C2Q Q6600 G0 Asus P5K Deluxe EVGA 8800GTX ACS3 626/2000 Team Xtreem PC2-6400C3 
Hard DriveOptical DriveOSMonitor
WD Raptor X, 2x36GB 15Krpm U320 SCSI, 400GB 7200.9 Samsung 18x SATA DVDRW XP Pro SP2 / Vista 32-bit Dell 2007WFP 
PowerCaseMouse
Ultra X-Pro EE 600w Antec P180 Razer Copperhead 
  hide details  
Reply
Desktop
(13 items)
 
  
CPUMotherboardGraphicsRAM
C2Q Q6600 G0 Asus P5K Deluxe EVGA 8800GTX ACS3 626/2000 Team Xtreem PC2-6400C3 
Hard DriveOptical DriveOSMonitor
WD Raptor X, 2x36GB 15Krpm U320 SCSI, 400GB 7200.9 Samsung 18x SATA DVDRW XP Pro SP2 / Vista 32-bit Dell 2007WFP 
PowerCaseMouse
Ultra X-Pro EE 600w Antec P180 Razer Copperhead 
  hide details  
Reply
post #23 of 27
Quote:
A bottleneck will not always be present.
That statement is absolutely correct, in all manners. Even if you state that there will always be a Von Neumann bottleneck present within a system, that statement is not apparent within all plausible simulations that could occur.

You are suggesting that if I execute a statement:

ADD BX, AX <RAW>> MUL CX, BX <<WAR>

I would have a bottleneck created within the thread posters system configuration?
Answer: No, no hardware component, or software abstract layer, will reduce the overall processing speed of this simple execution.

Quote:
E.g. in the cpu system, the cache is the bottleneck as it is slower to access than the registers. Conversely, out of the cpu and ram system, the ram is the bottleneck as it is slower to access than the cache and registers. And again, out of the cpu, ram and hdd system, the hard drive is the bottleneck as it is again slower than the ram, cache and registers.
As for you're statement about cache being slower than the Registers, in reality I disagree to a moderate level.

Cache, if full speed, operates with the same frequency as the processor core. The Cache Store Allocation unit also runs at this clock frequency. The speed of cache is identical to that of the CPU FPU Cores in reality, if running at full speed. Only a small latency will affect the speed of data transfer and execution from within the caching system.
Cache could only be classed, technically, as slower if there are multiple prediction breaches in the Microcode Pipeline. These breaches will cause breaks in the pipeline’s ability to execute data within cache. However the cache is not the part that is causing the system slowdown.

The bottleneck between the CPU and the RAM is called a Von Neumann bottleneck (technically). For your information this is not always present, it is not always present by definition. This only occurs if there is mass data, and tags, being addressed with the memory sub-system via the Northbridge in Intel computer systems.

My above example:

ADD BX, AX <RAW>> MUL CX, BX <<WAR>

This has absolutely no bottleneck, as the data to execute this code will be stored within the L1 cache always. Therefore this can be executed immediately by the FPU system to ascertain results without a possible bottleneck within components. The parts of the Micro-Processor that execute this code are of identical speed on the above test configuration system.
post #24 of 27
Quote:
ADD BX, AX <RAW>> MUL CX, BX <<WAR>
Im presuming that to be Assembly? Very interesting post, lots of engineering bits in there. Hopefully by the end of my major Ill be able to understand that completely =D

I do know that L2 cache can have fairly significant latency and it can have quite an impact on processing capability. This can be seen by AMDs new 65nm parts that actually perform worse than their older counterparts because of the difference in L2 cache latencies.

Im guessing that is because the same Boolean logic isnt possible with the change in size? That or the pathways or gates have more significant delays at that size for whatever reason be it layout or chemical.
Frag Machine
(13 items)
 
  
CPUMotherboardGraphicsRAM
e6300 L629A@3.00GHz Gigabyte 965P-S3 BFG 8800GTS 640MB 2xOCZ DDR2 800 Plat Rev 2 
Hard DriveOptical DriveOSMonitor
2x250 Lite On DVD-RW XP Home &amp; Vista Ultimate Sony 24in CRT 19inw LCD 1920x1200 + 1440x900 
KeyboardPowerCaseMouse
Saitek Eclipse II Antec Trio 650W Antec P180 Logitech G5 
Mouse Pad
Steel Pad QcK+ 
  hide details  
Reply
Frag Machine
(13 items)
 
  
CPUMotherboardGraphicsRAM
e6300 L629A@3.00GHz Gigabyte 965P-S3 BFG 8800GTS 640MB 2xOCZ DDR2 800 Plat Rev 2 
Hard DriveOptical DriveOSMonitor
2x250 Lite On DVD-RW XP Home &amp; Vista Ultimate Sony 24in CRT 19inw LCD 1920x1200 + 1440x900 
KeyboardPowerCaseMouse
Saitek Eclipse II Antec Trio 650W Antec P180 Logitech G5 
Mouse Pad
Steel Pad QcK+ 
  hide details  
Reply
post #25 of 27
Very good level of technical knowledge in your post again Manual, but like dBs said...L1 and L2 cache incurrs latencies. Latencies that are not present when confined to solely using register instructions like bitwise operations.

Also, there are even variations in the time taken to complete simple bitwise operations, which are faster than arithmetic operations like addition/subtraction which are in turn faster than multiplication/division operations (being that these are just a repetition of addition/subtraction operations), so even at this extremely low level, there is a ranking of the operations by their time taken to complete.

Back on topic...I hope you're not comparing the processing capabilities of the 8800GTX to a couple of very basic assembly commands. In terms of the initial topic, the 8800GTX may not be 'as bottlenecked' as initially thought, but just think where these reports initially came from - the Inq. Nuff said.
Desktop
(13 items)
 
  
CPUMotherboardGraphicsRAM
C2Q Q6600 G0 Asus P5K Deluxe EVGA 8800GTX ACS3 626/2000 Team Xtreem PC2-6400C3 
Hard DriveOptical DriveOSMonitor
WD Raptor X, 2x36GB 15Krpm U320 SCSI, 400GB 7200.9 Samsung 18x SATA DVDRW XP Pro SP2 / Vista 32-bit Dell 2007WFP 
PowerCaseMouse
Ultra X-Pro EE 600w Antec P180 Razer Copperhead 
  hide details  
Reply
Desktop
(13 items)
 
  
CPUMotherboardGraphicsRAM
C2Q Q6600 G0 Asus P5K Deluxe EVGA 8800GTX ACS3 626/2000 Team Xtreem PC2-6400C3 
Hard DriveOptical DriveOSMonitor
WD Raptor X, 2x36GB 15Krpm U320 SCSI, 400GB 7200.9 Samsung 18x SATA DVDRW XP Pro SP2 / Vista 32-bit Dell 2007WFP 
PowerCaseMouse
Ultra X-Pro EE 600w Antec P180 Razer Copperhead 
  hide details  
Reply
post #26 of 27
I just love it when these posts go all electrical engineering =D Inspires me.

I scowered the net last night trying to find where I had read it but I had read a significant review done on the 8800GTX to identify just how significant the bottlenecking is on the processor. It came to the conclusion that a C2D would bottleneck until about the 3GHz range (for both Allendale and Conroe models). Of cource that was at near obsene resolutions 1920 and above. Wish I could have found it last night, it was a great review I remember.
Frag Machine
(13 items)
 
  
CPUMotherboardGraphicsRAM
e6300 L629A@3.00GHz Gigabyte 965P-S3 BFG 8800GTS 640MB 2xOCZ DDR2 800 Plat Rev 2 
Hard DriveOptical DriveOSMonitor
2x250 Lite On DVD-RW XP Home &amp; Vista Ultimate Sony 24in CRT 19inw LCD 1920x1200 + 1440x900 
KeyboardPowerCaseMouse
Saitek Eclipse II Antec Trio 650W Antec P180 Logitech G5 
Mouse Pad
Steel Pad QcK+ 
  hide details  
Reply
Frag Machine
(13 items)
 
  
CPUMotherboardGraphicsRAM
e6300 L629A@3.00GHz Gigabyte 965P-S3 BFG 8800GTS 640MB 2xOCZ DDR2 800 Plat Rev 2 
Hard DriveOptical DriveOSMonitor
2x250 Lite On DVD-RW XP Home &amp; Vista Ultimate Sony 24in CRT 19inw LCD 1920x1200 + 1440x900 
KeyboardPowerCaseMouse
Saitek Eclipse II Antec Trio 650W Antec P180 Logitech G5 
Mouse Pad
Steel Pad QcK+ 
  hide details  
Reply
post #27 of 27
Quote:
L1 and L2 cache incurrs latencies. Latencies that are not present when confined to solely using register instructions like bitwise operations.
Technically yes they do. The latency of the Level 1 cache is extremely small in normal circumstances. This is because a larger cache will increase the cache delay as there are more memory cells within the structure it will take longer to access. This can be compensated by increasing the number of clock cycles allocated for cache accessing and addressing.
In this given scenario, using the above code I have specified, we can say that the data is stored within:

Cache (Level 1):
Tag Array = 2
Data Array = C
Store Location = D

The location of this data is within the Level 1 store. It is also, in our case, the first piece of information that is stored within the cache the processor in question, as it is a dual core processor, is capable of diverting an acceptable amount of clock cycles to compensate for the miniscule latency of the Level 1 cache. Therefore in our scenario the latency can be totally removed (as it can be for Level 1 cache, when the CPU has processing power spare). The level of associativity (Fully/Partially/Direct Mapped) plays no part in this issue.

Quote:
Also, there are even variations in the time taken to complete simple bitwise operations, which are faster than arithmetic operations like addition/subtraction which are in turn faster than multiplication/division operations (being that these are just a repetition of addition/subtraction operations)
Correct, however, factored into my information was the use of SSE class instructions.
Single Instruction Multiple Data Streaming Extensions (SSE) are instructions that have the capabilities of accelerating the ability of arithmetic instructions to the level of basic "bitwise" operations.
The code I previously specified can be accelerated to this speed, and is therefore of identical speed, if not faster than these instructions.

Quote:
I hope you're not comparing the processing capabilities of the 8800GTX to a couple of very basic assembly commands.
I am stating that there will not always be a bottleneck, regardless of the hardware/software component(s) used within a system

Back to topic then
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: NVIDIA
Overclock.net › Forums › Graphics Cards › NVIDIA › [RESULTS] Is that 8800 GTX really as bottlenecked as we thought?