Overclock.net banner
21 - 38 of 38 Posts
What's the GFLOPS reported in the stress test without using HT (only 6 threads)?
When I select to use 6 threads I get 300+ GFLOPs...??????

This is what I'd expect from the 12 threads result as it's perfectly inline with the other 8700k results.

The more threads I add after 6 GFLOPs slowly goes down until I hit ~230 for all 12 threads.


So basically... What?
 
LINPACK almost always loses performance with more than one thread per physical core because it's so memory/cache limited.

The faster the memory subsystem, the narrower the gap is.

Are you using the XMP profile on that memory? If so it's probably got some super slack subtimings that LINPACK is sensitive to.
 
LINPACK almost always loses performance with more than one thread per physical core because it's so memory/cache limited.

The faster the memory subsystem, the narrower the gap is.

Are you using the XMP profile on that memory? If so it's probably got some super slack subtimings that LINPACK is sensitive to.
Yep I'm using the XMP profile. Any chance you can ballpark me some decent subtimings so I can compare?
 
I don't know enough about your memory, your board, or how they interacts with the Coffee Lake platform to really give much in the way of specific advice.

Might be a good idea to find out what ICs that DDR4 kit is using and look through, or ask in, some of the memory threads for some settings that will serve as a good template.
 
The compatible binary is used intentionally for benchmarking to deliver more accurate results.
My bad, I was under the impression the AMD binary was optimized for AMD but a closer look shows it's actually a renamed linpack_xeon64 executable (11.1.3) from 2014 that has a very small change to bypass some software checks. In fact it appears to be the exact same binary from the Linx AMD edition.
 
Discussion starter · #26 ·
My bad, I was under the impression the AMD binary was optimized for AMD but a closer look shows it's actually a renamed linpack_xeon64 executable (11.1.3) from 2014 that has a very small change to bypass some software checks. In fact it appears to be the exact same binary from the Linx AMD edition.
The new Linpack binaries perform terribly on modern AMD CPUs. I don't have a Ryzen system nearby to diagnose the problem. AMD hardware isn't cheap like it used to be.
 
I don't know enough about your memory, your board, or how they interacts with the Coffee Lake platform to really give much in the way of specific advice.

Might be a good idea to find out what ICs that DDR4 kit is using and look through, or ask in, some of the memory threads for some settings that will serve as a good template.
Messed around with the memory.

If I drop the XMP profile and just use the jedec 2133mhz profile I lose ~20 GFLOPs when testing 6 threads (from 325 to 308). But I GAIN 30 GFLOPs when testing all 12 threads (from 230 to 260).

So, kinda weird I guess. I'd like for my 12 thread testing to reach 300+ GFLOPs but it doesn't seem possible.

EDIT
Gone back to the XMP profile and messed with the secondary timings.

I'm now getting 340 GFLOPs with 6 threads and 260 GFLOPs with 12 threads.
 
The new Linpack binaries perform terribly on modern AMD CPUs. I don't have a Ryzen system nearby to diagnose the problem. AMD hardware isn't cheap like it used to be.
Understand the problem with lack of hardware and if I could help I would but newest processor I'm sporting is Haswell and while it would be nice to get hold of an upcoming 64 core EPYC-2, finances dictate that Haswell is going to be it for quite some time. It gets the job done so can't complain.

As for the poor performance of Linpack on AMD, maybe something to do with this. Possibly the answer is to move away from Linpack and write a standalone that is neutral. Maybe some ideas could be had from Firestarter processor stress utility for heavy loading or if looking more for maximum possible GFLOP performance then maybe take a look at some work by Mysticial, precompiled binaries can be found here.
 

Attachments

In case anyone is curious:

The reason I got lower GFLOPs results with all 12 threads..... The ASRock Z370 Taichi. I've tried all available BIOS versions with no change.

Installed the Maximus XI Hero board today, and boom, same GFLOPs no matter how many threads are active.
 
Discussion starter · #31 ·
In case anyone is curious:

The reason I got lower GFLOPs results with all 12 threads..... The ASRock Z370 Taichi. I've tried all available BIOS versions with no change.

Installed the Maximus XI Hero board today, and boom, same GFLOPs no matter how many threads are active.
Are you running the latest version? 64-bit? with the benchmark feature?

Look in the task manager, make sure nothing is interfering with the CPU usage.
 
Are you running the latest version? 64-bit? with the benchmark feature?

Look in the task manager, make sure nothing is interfering with the CPU usage.
It was a specific issue with the ASRock Z370 Taichi. Confirmed by CPU scores across *certain* tests being lower than expected.

For example - on 3DMark Time Spy:

4.7GHz, 1.2v, Z370 Taichi: CPU Score = 8,000-8,100
4.7GHz, 1.2v, Z370 Maximus X Hero or Z390 Maximus XI Hero: CPU Score = 8,600-8,700

As previously mentioned, on the Z370 Taichi the GFLOPs would get lower and lower on the this LinxExtreme test the more threads that you add into the test. 6 threads results in the correct score, additional threads after this cause lower and lower GFLOPs.

On the Z390 Maximus XI Hero, GFLOPs is the correct 310+ score no matter how many threads are tested.

ALL other components and settings are identical. Only thing that changed is the motherboard (fresh windows install on both motherboards). This is a concern I've had for sometime with the Z370 Taichi as it scored lower than my Z370 Max Hero board, but I didn't have both the boards on hand at the same time to confirm. Now that I've got the Z390 Max Hero board side by side with the Taichi, I can confirm there is infact something peculiar with CPU performance on the ASRock board.
 
I'm using Linpack Xtreme now. It really does put the most stress on my new 9900K.

Why the limit of 9.6GB?
 
Discussion starter · #36 ·
RAM selection defines the size of the mathematical equation to solve.

Solving a problem size above 35,000 (RAM usage of 9.6GB - 12GB on most systems) reduces the stress efficiency. Therefore, it is not recommended.

On systems with "lots" of RAM and cores, it will automatically adjust to use more RAM.
 
21 - 38 of 38 Posts