...ok Part 2...GK 104 670 / 'Kepler', 7990 and efficiency scores
Karlitos - Please post all the PMs under the heading 'hey man' / May 28, 2013 - the one where you state "if you beat my score..."AND
...the first one was so shocking that I called an OCN editor that night wondering what to do...
..please post both PM series IN FULL - as you know that I have the counterparts.
Thank you in advance
Next: This exchange today is all over the place...and a lot of folks will have some egg on their face...
I spent literally several days pulling together the materials below because it is important that folks understand why what is happening...this constant hounding doesn't help matters - technical analysis does.
In addition: When I post at HWBot, I use HWBot rules...when I post at Valley, I use the 'OP' for Valley, and when I post at 'Heaven' I use their posted OP rules...you folks are mixing everything together...which confuses even more people.
I have had the 7990s for only a few days and know why the Heaven shot can cause you a headache...that is a completely separate issue I will deal with by posting a second shot w/tessellation off (instead of on extreme)...
That just leaves me to add that the thing below is big enough and I am now tired enough that I cannot edit it down further...and their may be typos and such - sorry about thatHERE YOU GO
...as promised some (though not all) tech tips for my quad 670ies...in the meantime I had some PMs which delayed things - instead of my quad 670ies (which scored way above any 680, 690 and most 7970ies, usually more or less matching 3 Titans) and which led to no end of speculation (and worse) in some quarters, now the problem seems to be the new Quad-fire 7990ies...which aren't even set up right yet as I had them for just a few days. I have shared 'some' though not all PMs with an editor when I first got them, but for now a bit of technical 'comparative analysis help folks along...
...the 7990ies sit in the same record braking (per below) efficient system that enabled the quad 670ies to score so high (6217 last posted, and 6262 in their last - for now - private run), even at lower clocks than some cards 'below it' - which is the norm, not the exception, for my systems as you will see. What were folks expecting would happen when that '670ies' system met some yummy 7990ies ?
Anyhow, all this made it into a much bigger piece...throw in some corporate responsibilities and a few special celebrations, and it all took a bit longer than I originally anticipated.
As I said before in an earlier post, in spite of a rather unique 'thinking' in this thread as of late which in my opinion threatens to damage it, there is no way you can prove a negative...but combined with techncial info, I hope we can make some good progress towards understanding some of the underlying technical bits.
I am also 'a bit more in your face' re some world records and so forth which I normally would express a bit more humbly. But they show an almost constant struggle...I hold multiple records at HWBot with the 670ies and now the 7990ies, with a good half of those achieved at (often far) lower clocks than surrounding scores...often, what I show you below was against competition running LN2 and so forth (I'm a water-cooler).
Yet my primary competitor is myself, whether here or elsewhere...there are scores I have not posted yet which would place me 'on top' in severl OCN bench threads, but I haven't bothered yet...I 1st want to find the time and finish setting up the new cards - which includes unlocking them (they have a dual BIOS w/switch)Relevant Background: Spoiler! (Click to show)
...as I am running 5x GTX 670ies and now 2x 7990 dual GPU cards, I don't think I classify as a typical 'fan boy' who needs to prove s.th. about the 'green team' or the 'red team'...all those cards btw are spread over 3 primary and 5 secondary (3770 VM) machines...
I have been oc'ing systems for almost two decades just for fun and that actually led to a career in computing (I have a graduate degree in another analytical discipline). Once the 'computing bug' got a hold of me - young overclockers take note making a virtue out of a vice - a new path started. I head a software development firm for 15 years now. In addition, I am an executive director of a large international NGO , overseeing high end computer systems serving over 30 million users. That does not make me all-knowing, but it does provide me with good hardware and software background.
...a quick visit to what's in my sig about the 'Proto Ivy-e..].' build-log (just started and not yet seeded with several text segments) shows that there are two primary systems which are 'way different'...a place to really be creative, celebrate the sheer joy of tinkering and overclocking and try stuff way outside
the box(es) that run quad Xeons and server clients in 160+ countries...the 670ies are part of that 'contraption' as are the '7990ies'
The outcome of these fun and games is here, filled with low clock / high efficiency runs
Tips on prepping 670ies
1sts step - cooling and card prepWarning: Spoiler! (Click to show)
- I keep a very big and powerful water-loop for 'CPUs only' and certainly don't want to introduce between 800w and 1000w of additional heat energy into that, even though the loop could probably handle that.
- I move cards about and run various configs (SLI, tri-SLI, quad-SLI) between mobos...water-cooled cards would make that very cumbersome. However, I will likely add a w-c loop with custom blocks for the two 7990ies as that is not as involved as doing it for four cards.
- All 5 of the 670ies (4 Asus Direct CU ii, 1 Gigabyte WF3 OC) have custom PCBs and 'strong VRMs'...in order to utilize that more fully, I took apart all 5 to replace the 'TIM'. The first thing I noticed that on 2 of the 5 cards, the hold-down screws of the cooler-to-PCB were not torqued the same way (some were loose) and the factory TIM pattern showed some unevenness on the die...
After having delidded Ivy CPUs before, I knew how important good TIM is and also how careful 're-mounting' had to be to get a nice, even temp reading, please have a look at the temps in the lower right in this. ...took about 4 tries to 'get it right' and get the cores balanced when idling (load depends on, well, the specific load).
The TIM I use is Coollaboratory Ultra (liquid metal) which works tremendously well but has to be applied sparingly and cautiously as it is capacitive and conductive...furthermore, it likes to 'eat' unprotected aluminium...on the 4 Asus Direct CUiis, the copper heat pipes are flattened where they meet the GPU die, but in between are 'thin' stripes of aluminium...so I painted MX4 (my TIM of choice for non-liquid metal applications) on those thin stripes 1st - and so far, this attempt at isolating the aluminium from the CL-U seems to have worked, though I will check it again in a month or so. Bottom line: This operation lowered GPU temps by over 5 C under load.
- Referring once more to my 'proto Ivy-e' build, you'll see that the mobos are horizontally arranged / flat and the GPUs are 'standing up' instead of 'hanging down' (sorry for the crude description). That helps with the next step
...the 'hard' triple and quad SLI bridges are nicer looking and also have other advantages over the soft bridges, but one HUGE overriding disadvantage: They make the air-cooled cards choke each other, even with a proven nice solution such as offered by Asus via Direct CUii or Gigabyte via WF3 OC.
...I traded some CL liquid metal TIMs for a custom extra-long SLI flex bridge that increases the distance between the 1st and 4th card just enough to insert some rubber spacers between each card so that they can breathe - in addition to using some very powerful server fans in push-pull config 'in front of' and 'behind' the quad SLI config. Depending on the position of each card, that dropped temps a further 12 C to a staggering 20 C...
...If you add these two 'temperature control' operations together, I could drop the temps as much as 25 C - now, with Kepler, boost and throttling in 13 MHz increments starting at 71 C, this makes a huge difference in performance and Valley scores - and helps set up the next step re BIOS.
...GPU BIOS tuningWarning: Spoiler! (Click to show)
...after doing some research, I decided on a 'regular' (915 base / 980 boost) Asus Direct CUii 670 as my 'initial' card for the then-new Ivys last October...in equivalent US dollars before taxes, I paid around $360 for it - I wanted a 'Kepler', but a quiet yet well-cooled one that did not suck up too much energy - and which could play games on my 27 inch Samsung LED single monitor w/all eye-candy while having a low-enough power consumption to allow for multiple cards later on. I found it to be a very nice card, and with a stock BIOS and using the included Asus GPU Tweak, it managed around 1215 or so as 'top speed' as measured with 3D11.
...a few months later, and after reading that NVidia had made great strides re 'micro-stutter' with their latest drivers, I took the 'plunge' into SLI and bought a second one (same box, same price) - and the 'pleasant trouble' started...
Before setting up SLI, I simply took the new card and put it in as a single where the 1st one had been to quickly test everything out- too my great surprise, without me touching anything, GPU Tweak showed '1137 MHz' as a boost value instead of 980 (before OC)...whaaaat ? I ran 3D11 and it showed a 'peak' MHz of 1359 (no oc, no crash).
...I had read about the Asus 670 Direct Cuii TOP (which I actually had considered) but also that it had some trouble as it was factory-clocked 'too far'- never mind that before my 1st purchase, I couldn't find it anymore in Canada / NCIX...that said, in many reviews I read, 670 TOP edition came awfully close to 680ies, sometimes even bettering them (a bit of an embarrassment to NVidia btw)
...what I think happened is that Asus pulled them off the market and re-labelled some as 'regualr 670 Dirct CUii' with a different non-TOP BIOS (later on, Asus came up with a slightly more expensive hybrid called the 'OC'clocked at 1058 I think). Needless to add that I went back after I realized what I had and bought two more...they too clocked on 'stock' boost much higher though not quite as high as the 2nd card (still, 1293 and 1306 stock boost is nothing to complain about).
MUST READ: http://www.overclock.net/t/1265110/the-gtx-670-overclocking-master-guide
That helped to really understand how Kepler boost works, and what to look out for, ie perfectly flat MHz lines during load and a low 'Power Usage combined with a high GPU usage'.
I briefly tried some more extreme custom BIOS but didn't like to keep any of them, not least as the systems in question are quad-booted and when running Windows Server 2008 / SQL Enterprise, I want the cards to idle and power down. I ended up settling on 'KGB Bios editor' and wrote my own values for the otherwise stock Asus BIOS...PowerTarget of 150% (instead of the stock max 122%), and max GPU voltage of 1.215v - the hard limit w/Kepler 104s (though there is some question re that, per below).
I run two 'primary machines' machines (per sig), the delidded Ivy that can bench at up to 5.3GHz (and CPUz Validate far beyond that), and the Sandy-E 3970X I take up to 5.3 GHz in benches (though prefer 5.250)...
However, as Valley is not (yet) an HWBot discipline, and as I reserve the final multiplier steppings for HWBot re points since I does stress even relatively 'low-v- chips, I have not run Valley faster than 5.1 with the Ivy and 5.2 with the Sandy-E...
Anyways, at that stage, my four Asus 670ies 'cracked' Valley's 6000 barrier, scoring 6036 at 5.1 GHz / Ivy. One thing was obvious, though:
The original 'slow' 670 was holding things back a bit. Since I have a lot of other machines in my home office that have no vid card at all, just iGPU, I decided to add another 670 and got a great price on a Gigabyte Windforce 670 OC (the other type of 670 card I had originally considered). 'Out-of-the-box' stock speed was 1346...however, it has better memory that can go much higher (570+) than the fastest Asus, and the Gigabyte has an 8+6 instead of a 6+6 Power Connection, so the Gigabyte became the 'lead card'.
I initially asked for help getting a custom BIOS and which worked fine, but then decided to rewrite the stock-one myself in an attempt to get the first two cards to run at identical speeds - and that worked out after a few tries :-) All told, in the Ivy all this came to a top score in Valley of 6073 or so.
By that time, the Rampage IV / 3970X combo neared completion...per below, it has some very special bus and memory customizations, and using the above 670 combo and settings, it cracked 6100++ easily...BUT: It is also a 'power hog' re wattage...Sandy-Es can suck back far above 400 watts once past 5.1 GHz or so.
... and I noticed that w/4 cards (unlike 3), I started to get BSODs relating to GPU or CPU voltage at settings which I know to be otherwise OK...a quick math check underscored what I had suspected...the Corsair AX1200 was near its limits, once efficiency losses and peripherals are taken into account...
There are those who strictly advise against a 'dual PSU' set-up and things and I carefully considered that as things can indeed go 'bad'...but with a spare Corsair TX850 (also Gold 80+; like the AX1200) I decided to give it a try...the key seems to be to use:
a.) single rail PSUs and
b.) do some load balancing in addition to the obvious stuff such as powering two of the four GPUs.
That step eliminated the aforementioned BSODs that only occurred before (and only occasionally) when running 4 cards. Now, there is close to 2000 watts feeding that mobo if need be. It also allowed me to bump the Power Target some more on the GPUs...and so I crossed 6200 in 'Valley', posting 6217 or so.
I actually did one more run w/670ies ('6262') I never posted because the '7990s arrived'...that step was following a hunch...I re-wrote the GPU BIOS and added another 0.0125v tot he hard limit of 1.215v...against expectations (this shouldn't have worked), the Asus cards seem to take it (there must have been one more 'bit' enabled), though the Gigabyte did not.
I am not 100% certain about the Asus, but each of the three did pick up another Kepler step of 13Mhz...in the end, I dialled it back to 1.215v as I do not want to 'fry' the GPUs (though rumour has that they can take up to 1.35v + - via external hard mods which this was not), and equally important, since the Gigabyte did not take it, now the primary cards were not at the same speed anymore...
MemoryWarning: Spoiler! (Click to show)
...in the commercial world I am in, I/O throughput for systems with millions of concurrent users is a key item...there are many variables affecting that, for example your disk arrays and such, but one key item I concentrate on is memory - and lots of it.
...take the Xeon E7-8870...a 10 core / 20 thread monster with a 30 mb cache that can address 4096 GB (!) of memory...a bit more than what the typical OCN user runs...
...for the home-office machines, I obviously don't run that kind of thing but no more than 32GB (for now) of TridentX "2400" which in the Ivy has run as fast as 2600...I did not buy that kit in a store, but it was given to me '.by a friend last November (the same person who runs the 4K Software company I mentioned before in this thread). Once I decided to keep it, I of course paid for it (and given mem prices since then, I was quite lucky...).
...what exactly I do with that is a 'trade secret'...but in MemTweakit, I am now not too far from 60,000 with the latest apparently workable but not yet fully Intel XTU stress tested setting (''above 2400 Mhz but below 2500 MHz ', fortunately, the 3970X IMC can do it)
...at times, I have used part of the RAM as a RAM Drive and used it for benchmarking, including Valley...though these days...one of the Quad Boot drives (Intel Series 520 SSD) actually has the 'thin' windows 7 install on it, with another SSD hosting Valley et al (another HD carries the regualr 'fat' Windows 7 install which includes SQL Enterprise variants, local-host web servers etc, ditto for yet another boot drive that has that but on Windows Server 2008)
...depending on the bench mark - I run various mem speeds, timings and ' + - ' BCLK...some benches like extra BCLK on memory, some don't. The point is that if you want to be fast on Valley, Heaven and so forth (or really, on anything), start with super-quick sub-systems...in a significant portion of the aforementioned HWBot records I hold, my GPU / VRAM clocks significantly lower than what the competitors were running. This happens a lot - and I also have lost a few ot guys running lower clocks but even better memory (on that front, Haswell might become a threat though for now, I just like the Haswell boards, not the first batch of actual CPUs...)
...again - it comes down to the bench...I have at least 3 different settings for '5.2' GHz for example (via strap, BCLK, multi) and sometimes one just has to run all of them with other variables held constant to figure out which works best for the test at hand.
All that said, you still need a very good 'IMC' in your CPU to pull that off in the first place, but if you have such a CPU (sorry, but I got two of them - not even binned), memory tuning can really help get great scores. The quad TridentX kit I have would probably be sold today as a '2600' kit, but back then, they did not offer that.
Somewhat related to memory are latencies on things such as GPUs that utilize buses...the Max V E board has a Plex chip that does introduce a bit of latency, but in turn lets you run 4 cards at PCIe 3. The R-IV-E board, being X79, has enough PCI lanes to run 4 cards at PCI3 (w/patch if you are running NVidia) without a Plex chip, but obviously not 4x 16x...unless :-)
...the above is very important re throughput - and here is a clip for 'Hothardware' that shows 2 7970GHz clocked faster than one 7990
The (quick) purchase of two 7990ies was actually a surprise - to me. I had considered adding 7970s to the Keplers to get 'the best of both worlds'...
...but then I really do like Titans and 780ies also...I read up on the Gigabyte 780 WF3 OC and that it was beating (stock) Titans in a fair number of benchies...as I was debating whether to buy 3 of the Giga 780 OCs or hold out for MSI 780 Lightnings, I came across a European thread were they had some interesting bits about Gigabyte's 7990s...they are clocked a bit higher to begin with, and have - so the rumour - similar new mem chips to the upcoming HD8000 series...as mine are still locked (not for long, what with dual BIOS switch :-) ) I am still 'stuck' at 1575 max but I know there is much more possible.
In addition, the thread stated (confirmed in an technical article elsewhere) that the version2 7990ies have the latest-gen Plex (much faster w/lower latencies) onboard creating 48 dedicated PIC lanes between each GPU on one PCB...and that is how I ended up with 16x 16x 16x 16x - an advantage over 4 single 7970ies, though there are also drawbacks...still, also considering easier air-cooling and/or water-cooling with 2 video cards instead of 4, I am happy.
Given my experience with overlooking and also commercial systems, I find that there is one measure - if I had to choose just one - which really tells me 'how well' my subsystem tuning (Ram et al) is doing - PhysicsX in 3d11. It tells me 'how efficient' the I/O systems are...and whether my least tinkering actually really resulted in more efficiency...I have run some benchies at HWBot whereby my frames per second exceeded '2600 FPS'...but even with good eye sight, a marginal change of a few tens - is kind of hard to catch...
Not too long ago in another thread, people with hexacores started to post their PhysicsX results...I seem to have had the highest score, though what really interested me was the improvements I had wrung out of the (still relatively new) R-IV-E system compared to my own initial runs
A technical note: As also posted by others I already, running multiple physical cards vs just one will lower your Physics score a bid...my own measures suggest as much as 250 to 300 points.
With that in mind, I still picked up almost 1000 points with memory and bus tuning over about a month of effort. Put differently, I saw a comparable post over the last 48 hours by another very successful HWBot poster with a similar (mobo, CPU) set-up...he needed almost an additional 200 MHz of CPU speed to match my 'single card' result above....and rightfully or wrongfully, I put that down to the unique subsystem set-up I have...
Area under the curve vs peak speedWarning: Spoiler! (Click to show)
...I already mentioned that it is not only possible but happens fairly regularly that higher GPU clocks with lower GPU clocks...when I did that in at least 12 or records at HWBot, it was not by choice...the other guys however were very often running 'LN2' and such on their cards. I certainly look forward to run MUCH higher clocks on my new dual 7990ies after I unlock them (within a couple of weeks) as high clocks are desirable as long as the cooling is there and the voltages are not too outrageous. Still, you can make up for lower clocks elsewhere via efficiency...
Now, a couple of the above factors (PSU, mem efficiency, bus set-ups and tuning) come together...needless to add, a big item relates to how fast you can run your CPU - especially with FOUR GPUs to make sure there are no bottlenecks feeding the GPUs.
Picture PrecisionX in your mind for a moment re the recorded data tab... You want your GPU GHz reading as a constant flat line 'at max' for as long as it is performing work. You also want a LOW Power Usage (indicating more headroom and no PSU limitation issues) and very high GPU usage (mid to high 90ies for ALL cards - best scenario is all 4 cards at the same high level of usage).
Also please consult the Valley OP for 'green and red graphs ' showing 'scalability'...as far as I know, I am the only one who posted Quad SLI 670 results...compare that to the neighbouring graphs (680s, 7970s etc) - what you see is very strong scaling from 3 to 4 cards - due to sub-system performance but also a very fast-running hexacore CPU that creates the necessary headroom.
I see so many people concentrate on just 'peak' GPU GHz and VRAM...if those values are high and can stay there - great, but I rather give up a few 'peak' MHz but keep the overall average high than get bragging rights...your bench score is usually the equivalent of the 'total area under the curve', not peak values.
Obviously, PrecisionX does not work with AMD and I don't have anything else yet other than 'CCC'...in fact, I found out today that I had inadvertently limited the 7990ies a bit by in the first couple of runs by not uninstalling the NVidia drivers and various related NVidia apps when I checked the 'resource manager'
...I am planning to reintroduce the 670ies at some point on that mobo as well. But once I decided to uninstall all the NVidia stuff (along with AB, PrecX etc), all of a sudden I could not run Valley, Heaven and 3d11 anymore - was getting an error message about D3Dxxx etc.
I then reinstalled the Catalyst stuff and it brought the 3D back, though now CCC doesn't show temps, MHz etc anymore for 3 of the 4 GPUs...oh well...I only had these cards for a few days and look forward to a nice, free weekend to really set them up right (and may be figure out how to get the Asus Ares 2 BIOS on there).
Still, have a quick look at the next pic (GPUz shark), the 98% usage figure for all 4 GPUs does make me happy. The second pic actually shows you what happened in an early Valley and Heaven run...and it still scored well...but the 1st pic is how it works now :-)
On a final note, unrelated to technical set-ups, I leave you with an observation or 'tip' which I cannot even fully explain (caching may be?) but I have found that there is about a 2 sec window near but not at the start of Valley which can add over a full 'fps' to the score...if I push 'benchmark' right away, the FPS will be lower by about 1 FPS...if I wait too long, same thing...if I hit it just right... +1 FPS...go figure...