Overclock.net › Forums › Graphics Cards › NVIDIA › brettjv's Microstutter General Information Thread
New Posts  All Forums:Forum Nav:

brettjv's Microstutter General Information Thread - Page 3

post #21 of 37
This is a very informative thread, and I didn't mention this, but there is a correlation (not causation) with microstutter and DPC latency.

This website even has a tool & guide for trouble shooting Deferred Procedure Called Latency.
http://www.thesycon.de/deu/latency_check.shtml

I generally have the odd spike or two with my sound card, Creative X-Fi Titanium Championship Series. No matter what I do, I get 500us spikes every 3 or 4 intervals non stop. Nothing I can do, but disable the sound card. You never know whats causing it till you troubleshoot but for me, its within acceptable tolerance.

Every extra peripheral that use DPC (whiiiiiiich is a lot) adds to the overall latency of the system.

I hope someday it will be helpful to some diagnosing issues.

Here is how my system looks, despite being a very beefy rig (stable OC already accounted for) and this is my DPC latency.


I would expect a nice smooth and low latency but that is not the case, this is just me idle on my desktop.

Another correlation, and not provable causation, that there is an inherent problem with Windows XP + Vista + 7 and that is the High-Precision Event Timer, HPET, usage. Not many people know this, but there is a device that regulates all the events on your computer, and it is Operating System controlled. Now when I say "inherent" problem, I mispeak technically because it works day in day out. However, on enthusiast gaming systems, it has been rumored also to hold these systems back as the faster the GPUs get, the more throttled they become, because the HPET works within a static realm of acceptable performance, too fast and the system gets throttled, too slow, and the software compensates.

None of this printed anywhere, its not in some forum, but it has been pieced together by word of mouth, experiences, and enthusiasts from from all four corners of the interwebz. Some people go as far as disable HPET in Device Manager to fix microstutter, infact that was one of the MAIN fixes for 5870/5850 in CrossfireX and TrifireX (which unfortunately didn't work for me) having microstutter.

To learn more about how it works, visit the basic wiki on it:
http://en.wikipedia.org/wiki/High_Precision_Event_Timer
Edited by RagingCain - 4/15/11 at 9:47am
Snowdevil
(16 items)
 
ASUS G750JM
(9 items)
 
 
CPUMotherboardGraphicsGraphics
[i7 4790K @ 4.4 GHz (1.186v)] [Asus Sabertooth Z97 Mark S] [nVidia Geforce GTX 1080] [nVidia Geforce GTX 1080] 
RAMHard DriveCoolingOS
[G.Skill 32GB DDR3 2133 MHz] [Crucial MX100 256GB] [Phanteks PH-TC12DX] [Win 10.1 Pro] 
MonitorMonitorKeyboardPower
[LG 29UM65 (2560x1080)] [QNIX Evo II LED (2560x1440)] [WASD v2 Tenkeyless] [NZXT Hale90 v2 ] 
CaseMouseMouse PadAudio
[ThermalTake GT10 Snow Edition] [Razer Mamba - Chroma] [Razer Kabuto] [Razer Man O' War] 
CPUMotherboardGraphicsRAM
i7 4770HQ Intel HM87 Express Chipset Geforce GTX 860M 8GB DDR3L 1600 MHz 
Hard DriveOptical DriveCoolingOS
Samsung SSD EVO DVD-RW Stock Windows 8.1 
Monitor
1920x1080 TN 
  hide details  
Reply
Snowdevil
(16 items)
 
ASUS G750JM
(9 items)
 
 
CPUMotherboardGraphicsGraphics
[i7 4790K @ 4.4 GHz (1.186v)] [Asus Sabertooth Z97 Mark S] [nVidia Geforce GTX 1080] [nVidia Geforce GTX 1080] 
RAMHard DriveCoolingOS
[G.Skill 32GB DDR3 2133 MHz] [Crucial MX100 256GB] [Phanteks PH-TC12DX] [Win 10.1 Pro] 
MonitorMonitorKeyboardPower
[LG 29UM65 (2560x1080)] [QNIX Evo II LED (2560x1440)] [WASD v2 Tenkeyless] [NZXT Hale90 v2 ] 
CaseMouseMouse PadAudio
[ThermalTake GT10 Snow Edition] [Razer Mamba - Chroma] [Razer Kabuto] [Razer Man O' War] 
CPUMotherboardGraphicsRAM
i7 4770HQ Intel HM87 Express Chipset Geforce GTX 860M 8GB DDR3L 1600 MHz 
Hard DriveOptical DriveCoolingOS
Samsung SSD EVO DVD-RW Stock Windows 8.1 
Monitor
1920x1080 TN 
  hide details  
Reply
post #22 of 37
Thread Starter 
Good stuff RC, thanks for contributing

Just curious ... if you look at Process Explorer when idle, what kind of CPU usage do you see for the 'Hardware Interrupts and DPC' task? Does that latency you have graphed there translate into something like a 'workload' on the CPU? Because I could see how the cyclical occupation of clock cycles on one's CPU could contribute to a cyclical fluctuation in render times from GPU's (i.e. microstutter) ...
Edited by brettjv - 4/15/11 at 2:56pm
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
post #23 of 37
Just for amusements sake, I tried DPC Latency on my laptop.

It was painful to see (Figure 1).



Thing is, I ran through a few options; the first yellow peak to the left of the red is the integrated Bluetooth (which is off, but not disabled in hardware) the huge red peak was the integrated webcam (which the drivers aren't even installed for... ) and the taller green peak appears to be related to the wired network as when I yank the network cable it goes soaring up to meet the heavens... (Figure 2)...



...

Ordinarily, this laptop is extremely responsive. Sure, spec wise it's no record breaker, but it's enough to work on comfortably. I would never have thought it was suffering from these apparent issues.

This is with the laptop completely idle; a fluctuation from 0-4% on the CPU, but not in time with the peaks.

edit: Cheers for the internets. Never won them before.


Edited by Paradigm Shifter - 4/15/11 at 2:36pm
Aoi
(20 items)
 
Midori
(14 items)
 
 
CPUMotherboardGraphicsRAM
Core i7 920 D0 Gigabyte G1.Killer Guerilla GTX670 4GB SLI 24GB Corsair Vengeance 
Hard DriveHard DriveOptical DriveCooling
WD Velociraptor Samsung F1 Blu-ray XL Corsair H70 
OSMonitorMonitorMonitor
Windows 7 Professional x64 Dell 2405FPW Dell U2410 Dell 2405FPW 
MonitorKeyboardPowerCase
Dell U2311H Microsoft Sidewinder X4 Silverstone Strider 1kw Corsair 700D 
MouseMouse PadAudioOther
Logitech G500 Ozone XL Integrated Logitech G13 
CPUMotherboardGraphicsRAM
Core i5 3570K Asus P8Z77-M Pro nVidia GTX680 Corsair Vengeance LP 16GB 
Hard DriveOptical DriveOSOS
WD Velociraptor 600GB Samsung DVD+RW Windows 7 Home Premium x64 Ubuntu Server Customised 
MonitorKeyboardPowerCase
Triple Dell U2412M Sidewinder X6 Corsair TX750 Fractal Design R4 
Mouse
Logitech G700 
  hide details  
Reply
Aoi
(20 items)
 
Midori
(14 items)
 
 
CPUMotherboardGraphicsRAM
Core i7 920 D0 Gigabyte G1.Killer Guerilla GTX670 4GB SLI 24GB Corsair Vengeance 
Hard DriveHard DriveOptical DriveCooling
WD Velociraptor Samsung F1 Blu-ray XL Corsair H70 
OSMonitorMonitorMonitor
Windows 7 Professional x64 Dell 2405FPW Dell U2410 Dell 2405FPW 
MonitorKeyboardPowerCase
Dell U2311H Microsoft Sidewinder X4 Silverstone Strider 1kw Corsair 700D 
MouseMouse PadAudioOther
Logitech G500 Ozone XL Integrated Logitech G13 
CPUMotherboardGraphicsRAM
Core i5 3570K Asus P8Z77-M Pro nVidia GTX680 Corsair Vengeance LP 16GB 
Hard DriveOptical DriveOSOS
WD Velociraptor 600GB Samsung DVD+RW Windows 7 Home Premium x64 Ubuntu Server Customised 
MonitorKeyboardPowerCase
Triple Dell U2412M Sidewinder X6 Corsair TX750 Fractal Design R4 
Mouse
Logitech G700 
  hide details  
Reply
post #24 of 37
Thread Starter 
Quote:
Originally Posted by Paradigm Shifter View Post
Just for amusements sake, I tried DPC Latency on my laptop.

It was painful to see (Figure 1).edit: Cheers for the internets. Never won them before.
Yer welcome sir, thank you. And ... that's some ugly latency you got going there PS

But this being said, I'm hoping to keep this as 'referenced for all eternity' kinda thread though so, no offense ... if you could somehow tie-in these results to the topic of microstutter i.e. why/how you feel these graphs could relate to the phenomenon ... it would please me
Edited by brettjv - 4/15/11 at 2:58pm
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
post #25 of 37
Quote:
Originally Posted by brettjv View Post
Yer welcome sir, thank you. And ... that's some ugly latency you got going there PS

But this being said, I'm hoping to keep this as 'referenced for all eternity' kinda thread though so, no offense ... if you could somehow tie-in these results to the topic of microstutter i.e. why/how you feel these graphs could relate to the phenomenon ... it would please me
No offence taken, my apologies.

Actually, I will attempt to make that bit relevant ASAP tomorrow, by posting what DPC Latencies look like from my Surround rig, and from another SLI rig.

I'd like DPC Latency logs from that other chap (FnkDctr?) who was complaining about microstutter, actually. That could be very interesting. Particularly DPC logs while gaming.

A sort of during/not-during dataset would be nice. The more information there is to work with, the better we can form a conclusion.
Aoi
(20 items)
 
Midori
(14 items)
 
 
CPUMotherboardGraphicsRAM
Core i7 920 D0 Gigabyte G1.Killer Guerilla GTX670 4GB SLI 24GB Corsair Vengeance 
Hard DriveHard DriveOptical DriveCooling
WD Velociraptor Samsung F1 Blu-ray XL Corsair H70 
OSMonitorMonitorMonitor
Windows 7 Professional x64 Dell 2405FPW Dell U2410 Dell 2405FPW 
MonitorKeyboardPowerCase
Dell U2311H Microsoft Sidewinder X4 Silverstone Strider 1kw Corsair 700D 
MouseMouse PadAudioOther
Logitech G500 Ozone XL Integrated Logitech G13 
CPUMotherboardGraphicsRAM
Core i5 3570K Asus P8Z77-M Pro nVidia GTX680 Corsair Vengeance LP 16GB 
Hard DriveOptical DriveOSOS
WD Velociraptor 600GB Samsung DVD+RW Windows 7 Home Premium x64 Ubuntu Server Customised 
MonitorKeyboardPowerCase
Triple Dell U2412M Sidewinder X6 Corsair TX750 Fractal Design R4 
Mouse
Logitech G700 
  hide details  
Reply
Aoi
(20 items)
 
Midori
(14 items)
 
 
CPUMotherboardGraphicsRAM
Core i7 920 D0 Gigabyte G1.Killer Guerilla GTX670 4GB SLI 24GB Corsair Vengeance 
Hard DriveHard DriveOptical DriveCooling
WD Velociraptor Samsung F1 Blu-ray XL Corsair H70 
OSMonitorMonitorMonitor
Windows 7 Professional x64 Dell 2405FPW Dell U2410 Dell 2405FPW 
MonitorKeyboardPowerCase
Dell U2311H Microsoft Sidewinder X4 Silverstone Strider 1kw Corsair 700D 
MouseMouse PadAudioOther
Logitech G500 Ozone XL Integrated Logitech G13 
CPUMotherboardGraphicsRAM
Core i5 3570K Asus P8Z77-M Pro nVidia GTX680 Corsair Vengeance LP 16GB 
Hard DriveOptical DriveOSOS
WD Velociraptor 600GB Samsung DVD+RW Windows 7 Home Premium x64 Ubuntu Server Customised 
MonitorKeyboardPowerCase
Triple Dell U2412M Sidewinder X6 Corsair TX750 Fractal Design R4 
Mouse
Logitech G700 
  hide details  
Reply
post #26 of 37
What is DPC latency?
Intel Computer
(13 items)
 
  
CPUMotherboardGraphicsRAM
[i5 2500k] [ASRock Z77 Extreme4-M] [EVGA GTX 560] [Kingston HyperX 8GB] 
Hard DriveOptical DriveOSMonitor
[Samsung F4 320gb] [Lite-On 24x Combo] [Windows 7 Ultimate 64bit] [Samsung SyncMaster 941bw] 
KeyboardPowerCaseMouse
[Microsft ANB-00001] [SeaSonic 520w M12II] [ASRock Z77 Extreme4-M Box] [Logitech G700] 
Mouse Pad
[Walmart Brand - Cloth Black] 
  hide details  
Reply
Intel Computer
(13 items)
 
  
CPUMotherboardGraphicsRAM
[i5 2500k] [ASRock Z77 Extreme4-M] [EVGA GTX 560] [Kingston HyperX 8GB] 
Hard DriveOptical DriveOSMonitor
[Samsung F4 320gb] [Lite-On 24x Combo] [Windows 7 Ultimate 64bit] [Samsung SyncMaster 941bw] 
KeyboardPowerCaseMouse
[Microsft ANB-00001] [SeaSonic 520w M12II] [ASRock Z77 Extreme4-M Box] [Logitech G700] 
Mouse Pad
[Walmart Brand - Cloth Black] 
  hide details  
Reply
post #27 of 37
Quote:
Originally Posted by Kvjavs View Post
What is DPC latency?
A deferred procedure call is basically a high-priority Windows thread pushing in in the execution queue to make sure it gets done as fast as possible, at the expense of other threads running.

From what I understand, DPC latency is the time it takes for the system to recover from this upset in the execution queue and get back to normal. It's usually caused by badly coded drivers.

...

OK, I've got DPC logs from an AMD-based nForce 980a GTX470 SLI system when idle and when gaming (with Just Cause 2).

Idle:



Gaming:



This system has no webcam, no wifi and no bluetooth. Yanking the network cable out the back did not cause a huge latency spike (it's an nVidia nForce Network Controller, rather than an Intel/Broadcom one in the laptop in the example a few posts earlier).

Most importantly, it has great latencies, and yet without Vsync, does appear to suffer from microstutter in several games. I don't care too much, since I can't deal with the tearing I see with Vsync off, so it's permanently forced on unless I'm benching.

Regardless, I not convinced DPC latency has any significant effect on the phenomenon known as microstutter. However, one system is not enough to tell; if anyone else can provide me with data, I would appreciate it.

Aoi
(20 items)
 
Midori
(14 items)
 
 
CPUMotherboardGraphicsRAM
Core i7 920 D0 Gigabyte G1.Killer Guerilla GTX670 4GB SLI 24GB Corsair Vengeance 
Hard DriveHard DriveOptical DriveCooling
WD Velociraptor Samsung F1 Blu-ray XL Corsair H70 
OSMonitorMonitorMonitor
Windows 7 Professional x64 Dell 2405FPW Dell U2410 Dell 2405FPW 
MonitorKeyboardPowerCase
Dell U2311H Microsoft Sidewinder X4 Silverstone Strider 1kw Corsair 700D 
MouseMouse PadAudioOther
Logitech G500 Ozone XL Integrated Logitech G13 
CPUMotherboardGraphicsRAM
Core i5 3570K Asus P8Z77-M Pro nVidia GTX680 Corsair Vengeance LP 16GB 
Hard DriveOptical DriveOSOS
WD Velociraptor 600GB Samsung DVD+RW Windows 7 Home Premium x64 Ubuntu Server Customised 
MonitorKeyboardPowerCase
Triple Dell U2412M Sidewinder X6 Corsair TX750 Fractal Design R4 
Mouse
Logitech G700 
  hide details  
Reply
Aoi
(20 items)
 
Midori
(14 items)
 
 
CPUMotherboardGraphicsRAM
Core i7 920 D0 Gigabyte G1.Killer Guerilla GTX670 4GB SLI 24GB Corsair Vengeance 
Hard DriveHard DriveOptical DriveCooling
WD Velociraptor Samsung F1 Blu-ray XL Corsair H70 
OSMonitorMonitorMonitor
Windows 7 Professional x64 Dell 2405FPW Dell U2410 Dell 2405FPW 
MonitorKeyboardPowerCase
Dell U2311H Microsoft Sidewinder X4 Silverstone Strider 1kw Corsair 700D 
MouseMouse PadAudioOther
Logitech G500 Ozone XL Integrated Logitech G13 
CPUMotherboardGraphicsRAM
Core i5 3570K Asus P8Z77-M Pro nVidia GTX680 Corsair Vengeance LP 16GB 
Hard DriveOptical DriveOSOS
WD Velociraptor 600GB Samsung DVD+RW Windows 7 Home Premium x64 Ubuntu Server Customised 
MonitorKeyboardPowerCase
Triple Dell U2412M Sidewinder X6 Corsair TX750 Fractal Design R4 
Mouse
Logitech G700 
  hide details  
Reply
post #28 of 37
Thankyou brettjv. A Very helpful and informative guide.
    
CPUMotherboardGraphicsRAM
AMD Athlon 64 3800+ Windsor Asrock N68C-S UCC XFX 8500GT Passive Cooled 2x Hynix 512MB DDR2 533MHz 
Hard DriveOptical DriveOSMonitor
Seagate 500GB SATA LG Supermulti SATA Lightscribe Debian Sid AMD64/Windows XP x86 Dell 17" TFT 
KeyboardPowerCaseMouse
CTC Keyboard PS2 Antec Basiq 350W Casecom KB-7760 Cheesegrater Logitech Mouse PS2 
Mouse Pad
F1 Magazine 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
AMD Athlon 64 3800+ Windsor Asrock N68C-S UCC XFX 8500GT Passive Cooled 2x Hynix 512MB DDR2 533MHz 
Hard DriveOptical DriveOSMonitor
Seagate 500GB SATA LG Supermulti SATA Lightscribe Debian Sid AMD64/Windows XP x86 Dell 17" TFT 
KeyboardPowerCaseMouse
CTC Keyboard PS2 Antec Basiq 350W Casecom KB-7760 Cheesegrater Logitech Mouse PS2 
Mouse Pad
F1 Magazine 
  hide details  
Reply
post #29 of 37
Thread Starter 
My Thinking on The Reward (from above)

<tl;dr>

I believe it's actually a bad sign when scaling is at or near 100%, because it means the driver is making no effort to produce evenly-spaced frametimes.

It seems to me, given the way that multiple GPU's would work w/one another using alternate frame rendering, in order to generally produce an even distribution of frametimes, the driver would have to deploy some sort of 'sampling' algorithm to detect and analyze how long 'recently-produced' frames are generally taking.

And attached to this algorithm there would need to be some sort of 'waiting' mechanism, whereby under certain circumstances (latency increases, for example), one or both the cards could be instructed to sit idle for brief periods so that the frame timing between the two can be brought back into an evenly-spaced synchronization.

I think this process, since it involves slowing down/making cards wait, would inherently reduce the scaling.

Conversely, I think that in order to get 100% scaling (or anything approaching it) the driver has to ignore or not deploy any 'frame timing' algorithm that involves 'waiting'. I think really high scaling is a sign that both cards are being instructed by the driver to run 'full-out at all times', which, since they're the same amount of power, can easily cause them to fall into a frame-production cadence that looks like this:

FF*******FF********FF********FF

This representing a timeline moving left to right, where F represents the point at which a frame is presented to the user (such an unevenly-spaced distribution of frames makes a perfect example of microstutter).

This is in contrast to the pattern you'd want to see:

F****F****F****F****F****F****F

Notice also that the evenly spaced frames only produce 7 frames in the same amount of time that the 'microstuttering' pattern will produce 8 frames. And the only way this could be maintained is if the driver occasionally enforces waiting periods because not all frames are going to take the same amount of time to render, and latencies will naturally vary.

Hence I believe really phenomenal scaling is a sign of the driver making no effort to space the frames out evenly.

</tl;dr>

<verbose excessive = "Y">

I'm going to be producing some crude drawings here to illustrate my thinking on this. It's not nearly as sophisticated of thinking as that which Paradigm Shifter shared with us above as I don't have the same level of knowledge as he does. It's entirely possible, in fact, that I'm totally 'missing something' and essentially talking out my rear ... but hopefully this will make sense and not fly totally in the face of reality

If we consider the output that happens with a single card, I would think that it would go something like as follows. Before I show that, here's my Legend of symbols:

X = Beginning of frame rendering on 1st graphics card
Y = End of frame rendering on 1st graphics card
Z = Point at which the frame on 1st graphics cards is presented to the gamer

A = Beginning of frame rendering on 2nd graphics card
B = End of frame rendering on 2nd graphics card
C = Point at which the frame from 2nd graphics card is presented to the gamer

The number after the letter represents the frame number. So: 'X1' means 'the point at which graphics card begins to render frame 1. Z2 means 'the point at which the 2nd frame prepared by the gfx card is displayed to the user'.

And this compound symbol: (Y1/X2) means 'graphics card finishes (thats the Y) rendering frame 1, and begins (thats the X) rendering frame 2'

Note that I am supposing that the concept of the differentiation between Y and Z actually 'exists', i.e. it is true that there is some period of time between when a gfx card finishes preparing the frame (at which time it is ready to begin work on the next frame) and when it is actually presented to the user. I suppose terms like 'buffer flipping' and 'latency' would come into play in terms of describing what's going on in this time period.

In any case, I have NO idea what that length of time IS, relative to the time period between X and Y, so I'm just going to show it here as being 1/2 the amount of time between X and Y. I'm sure it's much shorter than that, and is going to be a roughly 'static' number (i.e. it's probably like 1ms, no matter how long the frame took to render)... but I don't think it's super relevant how long it actually is for my purposes here.

Without further adieu, I will diagram what I imagine a single cards output would roughly look like. Note that our 'time to render' is 10ms in this case, each + or () symbol represents one millisecond:

Figure 1
(Y0/X1)++++(Z0)++++(Y1/X2)++++(Z1)++++(Y2/X3)++++(Z2)++++(Y3/X4)++++(Z3)++++(Y4/X5)++++(Z4)

So if we just look at our actual frame output to the user, what we see is this:

Figure 2
++++(Z0)+++++++++(Z1)+++++++++(Z2)+++++++++(Z3)+++ ++++++(Z4)

In other words, every 10ms we see a frame. In this case, we had 5 total frames displayed in a 45ms timespan = (5/.045) = 111fps. So the time variability between the display of frames is dictated by the how long the card takes to render the each frame, e.g. the time gap between X and Y. A somewhat even distribution of frames over a given time span is imposed 'naturally' in this case, dictated by the power of the card vs. the difficulty of the rendering task.

So now let's imagine possible scenarios when we introduce a 2nd card into the equation.

Optimally, the 'best' display pattern (in terms of smoothness) we could hope for over this period of time when we introduce a second card is one that would look like this (to help illustrate, I'm using green for card 2 and red for card 1, along with the letter differentiations):

Figure 3
++++(Z0)++++(C0)++++(Z1)++++(C1)++++(Z2)++++(C2)++++(Z3)++++(C3)++++(Z4)

In other words, 9 frames over 45ms = (9/.045) = 200fps or 89% scaling (89 difference/100fps) compared to the single card scenario.

So lets think for a moment how we actually GET this 'even' result we're looking for.

In a very basic sense, we have a time frame (Figure 4) over which we can 'tell' the 2nd card to begin it's rendering task (provided of course it's not already busy), relative to the first card. So we have this 10ms time range to work with on the 1st card, the trick is deciding where to have card 2 start (or do we just always force it to start asap?):

Figure 4
(X)++++++++(Y)
Where on this time continuum do we start the 2nd card on the next frame to produce even-distribution of frames to the user?

Obviously we don't want to start card 2 on it's task at the point of (X), because then in the end we just end up with both cards duplicating the same work. And we wouldn't want to start card 2 on it's task at point (Y) either, as then we'd get a scaling number of 0%, as this is the same point at which card 1 could begin a new rendering task even w/o card 2.

Looking at the basic question, simple logic would seem to indicate that, if we want the 'smoothest' frame distribution output, we would want the 2nd card to begin it's rendering at the exact mid-point (5ms mark in figure 4) of the rendering process of the 1st card, correct? And likewise, card 1 would need to do the same for card 2 when it's turn came back around, as they are alternating.

Note that here, what is in red is basically a duplicate of Figure 1, but every 10ms I've inserted a Bx/Ax, representing a finish/start-next rendering task on the second card, starting at the centerpoint of the render task for card 1. I've also inserted Cx values to show where card 2's frames get displayed to the user. x = the frame number.

Figure 5
(Z0/B0/A1)++++(C0/Y1/X2)++++(Z1/B1/A2)+++(C1/Y2/X3)++++(Z2/B2/A3)++++(C2/Y3/X4)++++(Z3/B3/A4)++++(C3/Y4/X5)++++(Z4/B4/A5)++++(C4/Y5/X6)

Figure 6
Which, if you look at just the presentation of frames, will look like this:
(Z0)++++(C0)++++(Z1)+++(C1)++++(Z2)++++(C2)++++(Z3)++++(C3)++++(Z4)++++(C4)

Perfect, right?

Here's the problem: I've created a very 'Pollyanna' scenario here by doing the following:
1) Making the 'time to render' (X-Y) a constant, and 'knowable in advance' value.
2) Making the 'latency' (Y-Z) a constant, and 'knowable in advance' value.
3) Making the 'latency' a value which is exactly half (5ms) of the value of (X-Y, 10ms), when I also handily have two cards I'm working with.

However, in reality:
1) X-Y (render time) is naturally going to vary,
2) Y-Z is naturally going to vary, and is extremely unlikely to be 1/2 of the 'render time' for each card. In fact, it's certain to be a much smaller number than this.

In order to actually produce an output like the one we desire, seen in Figure 6, a couple things have to happen:

1) An algorithm must be deployed to quickly analyze the performance (i.e. how long is the period from (X) - (Z) and (A) - (C)) on recent frames, and then
2) Attach a mechanism to the algorithm that allows for WAITING if necessary in order to produce a display of frames at an even pace.

Basically in real life we aren't going to have the Pollyanna scenario I described above, so there *will* be times where waiting, either before starting rendering of the frame (<--X) or before sending the frame off to be displayed (Y-->) would be required in order to produce an even cadence in the actual presentation of frames like we see in figure 6.

And the logical extrapolation from this is obvious: If a purposeful delay, to account for these ever-changing render times and latency, is deployed in this process, then the raw FPS performance numbers are going to necessarily suffer.

Alternatively, rather than utilizing a complex algorithm like I speak of to optionally introduce waiting into the process as needed, the driver programmer can simply set up this system to run 'full-out' on both cards at all times, and let the chips fall where they may.

But such a decision is almost certain to allow the two cards at times to fall into a presentation cadence that looks something like this:

(Z0)(C0)+++++++
(Z0)(C0)++++++++(Z0)(C0)

I.E. very nice FPS, but terrible microstutter.

So ... finally we come to the bottom-line ... I believe that in the cases where we see really phenomenal, 100%-ish scaling, it is most likely the case that the algorithms deployed to allow for waiting have been either disabled, or had a 'max wait time' cap put onto them.

I say this because, while it is theoretically possible in some cases to achieve very good scaling
and still maintain a nicely consistent frame rate (perhaps in the 80% range, although your algorithms have to be incredibly accurate, and both latencies and work-loads will have be very 'steady', to have a chance at this), there is just no way anything near 100% is going to happen w/o simply running both cards 'full-out' with no regard for an even distribution of frames to the end user.

</verbose>

Edited by brettjv - 4/18/11 at 12:40am
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
post #30 of 37
Thread Starter 
Wow, looks like I killed my thread w/my excessive verbosity.

I've produced a tl;dr version of my last post, and hidden the intimidatingly (dare I say, painfully) long dissertation on the details of why I think that 100% scaling is actually a bad thing in terms of microstutter.

Hopefully now our lively discussion will continue
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
xeon X5675 6-core @ 4.1ghz (1.29v, 20x205 +ht ) rampage iii extreme msi rx470 gaming X (the $159 budget king) 3 x 2gb corsair xms3 pc12800 (9-9-9-24-1T@1600MHz) 
Hard DriveOptical DriveCoolingOS
hynix 250gb ssd (boot), 2tb deskstar (apps),1tb... plextor px-712sa - still the best optical drive... corsair h8o v2 aio W10 home 
MonitorPowerCaseAudio
asus vw266h 25.5" (1920x1200) abs sl (enermax revolution) * single 70A rail 850w silverstone rv-03 XFi Titanium 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: NVIDIA
Overclock.net › Forums › Graphics Cards › NVIDIA › brettjv's Microstutter General Information Thread