More tests need to be done if we want accurate results within a degree C. How about 10 different processors delidded, cleaned, and pasted 10 times each, cooled with very precise temperature controlled water, then average the results?
Doing it once and comparing the results is why we don't have a consensus on these matters. Everyone has a different result or say. Does the gap matter or not? These results seem to show TIM is the only issue. The old Anandtech thread pointed to the gap being the issue in Ivy Bridge.
I can say this for kaby lake and sky lake. I have seen some that the TIM application just sucked. Meaning it looked like there wasn't enough TIM added as there were bald spots on the Die and IHS. Some have had to much. And some the adhesive was wayyyyy to thick to where you could tell that it was not pressing correctly.