Overclock.net banner

1 - 20 of 28 Posts

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #1 (Edited)
Been analyzing three different network cards and some network settings within Windows. I will provide some info about different hardware, and different OS / IP stacks settings.

I will also provide some advices to:
https://www.speedguide.net/articles/windows-7-vista-2008-tweaks-2574

Killer cards
<img>https://goo.gl/images/rzTJwQ</img>

The main idea behind original Killer cards is that it has dedicated CPU, which takes care of all network-related tasks. This works up to 2 limitations:
a) More moderns PCs can handle those tasks much faster.
b) Killer cards did not offered "Receive Side Scaling"
Basically, multiple threads of you CPU can handle incoming packets, and thus accelerating the networking.

Killer cards today offer QoS management (quite good imo) and some networking tweaks.

Expensive network Cards - Server grade
<img>https://goo.gl/images/nNQTWm</img> Intel I210-t1
Cards like these offer wide variety of features and settings. Those features offer either higher performance, more effective operation, and power saving features. The particular Intel i210-t1 is quite plain, yet it offers offloading features, and Receive side scale up to 4x (4 threads). When the card is properly configured, ping from my system to the home switch is 120 microseconds. You might encounter i211 NIC onboard chip, which is exactly same in terms of features.

It also offers Direct Cache Access - so packets are put into CPU cache and thus processed much faster.

Even when i described the card as "plain" most of those features are not available for most consumers for various reasons.

Common network cards - onboard
<img>https://goo.gl/images/7a1PS8</img> Intel I219-v
<img>https://goo.gl/images/GAR96d</img> Realtek 8111

Cards like these support Receive side scale up to 2x (two CPU threads). They have almost same set of offloading features as the previous ones, but they lack certain power saving features. These two chips (Intel and Realtek) are in terms of features practically identical. Some settings you might find in expensive cards are either not available. or you cannot change them in the driver.

In general their setting is much simpler, and common user wont see any difference between expensive and common NIC at all.

Comparing cheap and expensive NIC chips now is more complicated as it was 10 years ago. Even cheap chips have some accelerating features, yet apparently the expensive cards allow for higher effective CPU utilization, as the network related operations take much less time. Download speed will be roughly the same, ping to game server roughly the same as well, what you will get is faster communication between your CPU, NIC and Switch/router in >1 milisecond timescale. That means, it helps only to internal system operation level, and telling "you should buy this NIC or this NIC" is selling snake oil unless you know what and how to configure.

For example one of the power saving features is "DMA Coalescing". More info here:
https://www.intel.com/content/www/u...007456/network-and-i-o/ethernet-products.html

This is a power saving feature, which allows CPU to be idle more as NIC will not send data to it immediatelly but after certain pause. Driver for my I210 allows timeframes of 250 and 500 microseconds, 1 milisecond up to 5 miliseconds. Less expensive variants which dont support changing this feature may not have this feature at all (best case) or have set it on some "default" value. Such setting may in the end slow down game engines of MMOs, BUT to diagnose such behavior you would need some packet analyzer and look deeply into it.

Bad news on the OS side.

NetDMA not supported post Windows 8
This feature allows NIC card to put packets directly into RAM (or CPU cache with DCA enabled) and thus accelerate overall networking. It works great with "Receive side scaling".

Apparently Microsoft dropped support for NetDMA (and DCA) in Windows 8. Even when "netsh int tcp show global" command will give "enabled" status, you have to use "netsh int tcp show netdmastats" to verify if there are any data even copied using this method.

Also pre-requisites to run this feature are that NIC, CPU and Chipset and OS are supporting this feature.

On brighter note, i heard rumors that its possible to run this feature on certain versions of Win10. Can someone check it?

Chimney offload might not be supported by your NIC.
a) This feature does not run when NetDMA is enabled - so before you will run the command below turn it off.
b) Even when NetDMA is disabled, this feature has to be supported by the network card.
To verify that run "netsh int tcp show chimneystats" but out of all NICs I had, none supported this feature. In that case, you can simply disable "Base Filtering Engine" service which is responsible for this feature.

It seems that hardware is compatible with NetDMA, but MS is ending the support, while no hardware is compatible with Chimney offload, but MS is still offering it.

How to setup system for best game performance?

a) Enable Receive Side Scaling
Card driver might support settings 2 or 4, its rarely higher. I dont recommend to go beyond settings allowed by the driver, but guide i posted contain info how to increase that beyond this value.

b) Disable Chimney offload if your NIC dont support it.

c) Enable NetDMA if possible (Win 7).
Enable DCA if supported as well.

d) Receive Window Autotuning : Disable.
The goal is to process data as fast as possible, games use smaller packets so is not needed to have bigger or dynamic buffers. Also disable Windows Scaling heuristics. If you experience trouble while downloading (longer latencies and/or decrease of download speed) set it to "normal"

e) ECN Capability
If you are lucky enough to know that your switch or router supports it, enable it. If you dont know, you might try to enable it and see if it has any impact when you are gaming and downloading data.

This feature helps in situations when your internet link gets utilized.

f) Disable Nagle Algorhitm
g) Disable Add On congestion control (works with Receive window autotuning, which we disabled earlier).

h) Disable Jumbo packets.
i) Disable DMA Coalescing / Interrupt moderation.
MOre info in this post: http://www.overclock.net/forum/18049-network-hardware/1658977-gaming-networking.html#post26758097

Notes:
Added info about RFC 1323 Timestamps http://www.overclock.net/forum/26765433-post18.html
 

·
Newb to Overclock.net
Joined
·
4,168 Posts
Were you able to measure any decrease in latency?
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #3 (Edited)
Only with Wireshark which is packet analyzer software.

It went from 1ms to 0,12-0,24ms ping to router when comparing onboard nic (before tuning) and expensive intel nic (after tuning).

As i edited bit later, it helps much more on CPU/RAM level than with networking. Threads of a 3d engine which are dedicated to networking are in CPU utilization graph showing taller (higher CPU utilization) and shorter (less time required) spikes. It indicates (just indicates, does not prove) that game engine was waiting for the data to process or the whole processing took some more time.

Therefore, benefits are measurable, but i would not say perceivable.
 

·
Newb to Overclock.net
Joined
·
4,168 Posts
I suppose this could mean the game engine can do less interpolation as more granular network data is available. Overall, this will output more server-accurate rendering.
 

·
Premium Member
Joined
·
6,675 Posts
Jumbo packets help with larger data transfers and utilizing high throughput connections, but does having it enabled actually hurt small data transfer performance? I was under the impression that the network card+switch+whatever actually processed a jumbo sized packet at once and did not require breaking it down to smaller sizes to process. So what does it matter what size the packets are if they are all processed in the same time frame? Sure it is less "efficient" when you are doing smaller transfers, but if it is processed the same then what does it really hurt?
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #6
I suppose this could mean the game engine can do less interpolation as more granular network data is available. Overall, this will output more server-accurate rendering.
Saying yes would be misleading.

Actually it means that CPU spents less time waiting and processing the packets after arrival to local network, considering modem, router, NIC, CPU, RAM and at last game engine. However the time from server to client would be 39 060 microseconds in total, while unoptimized time was 40 000 microseconds. What it actually did changed is that after packet arrival, it does not take 1000 microseconds to process it by the engine and CPU, but just 60.

In the end, it helps mainly to FPS and CPU because engine spends less time waiting until the packets are processed. But its in level well beyond human perception.


Jumbo packets help with larger data transfers and utilizing high throughput connections, but does having it enabled actually hurt small data transfer performance? I was under the impression that the network card+switch+whatever actually processed a jumbo sized packet at once and did not require breaking it down to smaller sizes to process. So what does it matter what size the packets are if they are all processed in the same time frame? Sure it is less "efficient" when you are doing smaller transfers, but if it is processed the same then what does it really hurt?
That is specific for differrent types of internet connection. I can enable Jumbo packets, and since maximum transfer unit (MTU) increases from 1500 bytes to 9000bytes, network devices will use 1/6 less of their computing power to work with packet headers (or overhead).

But because I have ADSL 2+ internet connection (quite ancient, but with decent ping), each packet is broken down from 1500 MTU to much smaller 48byte ATM packets which are transferred over DSL line and then are reassembled to a normal packet and continue on its way. Actually the connection i use does not allow 1500 MTU size, but just 1492 (officialy, but its actually even less).

Attempts to transfer bigger packets may result in packet fragmentation. That means that original packet has to be broken down to smaller packets which arrive to the destination separately. If one of two packet fragments is lost, packet as a whole gets lost too.

I am not sure that Jumbo Packets can get thought internet connection line as a whole, or get fragmented. Fact is that most games use smaller sommunication packets exactly to avoid fragmentation or packet loss.
 

·
Premium Member
Joined
·
6,675 Posts
That is specific for differrent types of internet connection. I can enable Jumbo packets, and since maximum transfer unit (MTU) increases from 1500 bytes to 9000bytes, network devices will use 1/6 less of their computing power to work with packet headers (or overhead).

But because I have ADSL 2+ internet connection (quite ancient, but with decent ping), each packet is broken down from 1500 MTU to much smaller 48byte ATM packets which are transferred over DSL line and then are reassembled to a normal packet and continue on its way. Actually the connection i use does not allow 1500 MTU size, but just 1492 (officialy, but its actually even less).

Attempts to transfer bigger packets may result in packet fragmentation. That means that original packet has to be broken down to smaller packets which arrive to the destination separately. If one of two packet fragments is lost, packet as a whole gets lost too.

I am not sure that Jumbo Packets can get thought internet connection line as a whole, or get fragmented. Fact is that most games use smaller sommunication packets exactly to avoid fragmentation or packet loss.
Would it be beneficial to enable the jumbo frames for LAN traffic and then have the router's WAN MTU set to 1500 (or whatever for a specific internet connection) so that local traffic can make use of them but internet traffic gets sized correctly when it hits the router?
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #8 (Edited)
For local networking, yes.

For anything going to or from the Internet? Depends on your ISP, but if we are speaking strictly about gaming, there will be no benefit.
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #9 (Edited)
Interrupt Moderation / DMA Coalescing
https://docs.microsoft.com/sk-sk/windows-hardware/drivers/network/interrupt-moderation
https://www.intel.com/content/www/u...007456/network-and-i-o/ethernet-products.html

These are a power saving features on Windows Drivers since NDIS 6.0 (Win Vista and later if i am not mistaken).

Both works in a manner that network card driver waits in pre-defined time before it sends data to CPU(s). Expensive network cards i mentioned allow to define this time (250 microseconds, 500 microseconds, 1-5 miliseconds) and to specify exceptions by used ports. Good news = even cheap Realtek 8111 allows in its driver to disable it.

Benefit when disabled: packet data are sent from NIC to CPU immediatelly, smaller amount of receive buffers needed.
Disadvantage when disabled: Slightly higher power consumption of whole system. Might cause some trouble for notebooks and their battery life.

For gaming and high-performance scenarios I would recommend to disable it.
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #11
Where can I buy expensive Network card pcie
First check what you have onboard. Even cheap Realtek 8111 is not that bad as I expected in the beginning.

I210-t1 i am testing is PCI-E 2.0 x1 and costs between 30-50 dollars. Thats quite expensive.

But i have to mention, that purchasing the card was more like about getting better educated about network drivers and their settings. Experience i had is that correct driver settings/tweaking and TCP stack setting is more important than the hardware itself.
 

·
Registered
Joined
·
538 Posts
First check what you have onboard. Even cheap Realtek 8111 is not that bad as I expected in the beginning.

I210-t1 i am testing is PCI-E 2.0 x1 and costs between 30-50 dollars. Thats quite expensive.

But i have to mention, that purchasing the card was more like about getting better educated about network drivers and their settings. Experience i had is that correct driver settings/tweaking and TCP stack setting is more important than the hardware itself.
Oh... I thought it was cheap? The expensive ones I thought were starting from £100 plus then you got the ones with dual 1GB ports and heatsinks at 8x PCIe from £175 and up. Those were the ones I thought were quite expensive.

I do have the same card as you. Had been thinking of getting one of the Intel Pro with the heatsinks but wasn’t sure yet... I had to purchase the I210-t1 because the onboard Broadcom/Ethernet drivers started playing up with the Windows 10 Fall update. While there were no newer drivers. Newest was 2013.
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #13
Well, its expensive for gaming purpose. 2-4+ port NICs are usually entry/server grade, i210-t1 is sort of "taste" how such hardware works.

Unless you plan to have 2 LAN cables connected to different switches, and use network team in case one of the lines fail... And routers allow to use secondary or even tertiary WAN connection.
 

·
Premium Member
Joined
·
8,233 Posts
Oh... I thought it was cheap? The expensive ones I thought were starting from £100 plus then you got the ones with dual 1GB ports and heatsinks at 8x PCIe from £175 and up. Those were the ones I thought were quite expensive.

I do have the same card as you. Had been thinking of getting one of the Intel Pro with the heatsinks but wasn’t sure yet... I had to purchase the I210-t1 because the onboard Broadcom/Ethernet drivers started playing up with the Windows 10 Fall update. While there were no newer drivers. Newest was 2013.
You can still get those used on ebay for pretty cheap...


I'm about to make the jump to 10gb with SFP+ and not Ethernet. As 10GB Ethernet is still crazy expensive. While you can a pair of 10GB SFP+ cards for under $50. You can get a Smart Managed 24 port Switch with two SFP+ 10gb ports for $125. And if just going short distance you can get SFP+ patch cords for about 15-20 each, or you can get SFP+ Transceivers with LC connections and you can use Fiber with LC ends at pretty much any length you want (multimode @ 10GB you can go more than 1500ft, Single mode @ 10GB you can go over 6 miles). But I'm not going for Latency Reduction, I care more about throughput to the data server.



Still I don't think any of this really has any real noticeable effect in terms of Latency in game. The real Latency problems are on the ISP side. Not to say that a poorly configured home network can't make things worse. Someone hogging bandwidth or a crappy router can easily cause problems.
 

·
Registered
Joined
·
538 Posts
Yeah, I saw some of those 10GB Intel cards for say £200 - 400 ish... I really need to get a new router though.
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #16
You can still get those used on ebay for pretty cheap...


I'm about to make the jump to 10gb with SFP+ and not Ethernet. As 10GB Ethernet is still crazy expensive. While you can a pair of 10GB SFP+ cards for under $50. You can get a Smart Managed 24 port Switch with two SFP+ 10gb ports for $125. And if just going short distance you can get SFP+ patch cords for about 15-20 each, or you can get SFP+ Transceivers with LC connections and you can use Fiber with LC ends at pretty much any length you want (multimode @ 10GB you can go more than 1500ft, Single mode @ 10GB you can go over 6 miles). But I'm not going for Latency Reduction, I care more about throughput to the data server.



Still I don't think any of this really has any real noticeable effect in terms of Latency in game. The real Latency problems are on the ISP side. Not to say that a poorly configured home network can't make things worse. Someone hogging bandwidth or a crappy router can easily cause problems.
Yeah, I saw some of those 10GB Intel cards for say £200 - 400 ish... I really need to get a new router though.
Thats more a solution for an external storage. Low latency, transfer rates which can go up to 1250 mb/s. With such setting you can run SAN device like its direclty in your PC.
 

·
Premium Member
Joined
·
8,233 Posts
Thats more a solution for an external storage. Low latency, transfer rates which can go up to 1250 mb/s. With such setting you can run SAN device like its direclty in your PC.
I do my own share of fiber for work. As we pretty much install it for customers for the backbone of their networks in new facilities. Cat6 just doesn't have the range, and for what we do you can be a good 500ft+ away from the main office and different nodes.


You can still use SFP+ 10G Network cards just like standard Ethernet card. Really there isn't any difference really, it is pretty much plug and play. Just different style of connector. Honestly I think we should just forget about Ethernet and just use LC connectors for everything. I honestly think 10G Ethernet is a expensive joke, No reason not to switch to Fiber at this point in time. Cheaper and more stable. But instead of SFP ports, they should just give you an LC port. No point in needing to use a transceiver. As SFP port allows you to use more than just fiber, and more than just one style of Connector. LC is by far the most standard type though.

112MB/s just isn't enough for me, but it isn't slow either.
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #18 (Edited)
If you use command "netsh int tcp show global", at the bottom of he report you will find "RFC 1323 timestamps" option at the bottom.

Long explanation here https://tools.ietf.org/html/rfc1323#section-4

Simplified explanation
Its a mechanism which should improve TCP reliability by adding timestamps. Existing 20byte header of a TCP packet will be increased by additional 12 bytes. In such case TCP protocol will disregard certain packets which arrive with old timestamps, which might otherwise break TCP connection.

Many online games are using UDP connection, not TCP connection.

In most cases this option will have little to no effect to online gaming. In case the Server-Client connection suffers from trouble (high retransmission rate), and its using TCP packets, you might consider to enable it, but there are following recommendations.

1. Retransmitted segments are reported high
Use command "netsh interface ipv4 show tcpstats". Be sure that since reboot, you did not opened browser, just the online game you are examining.

2. If "Retransmitted Segments" is showing high amounts and/or "In Errors" are showing high numbers, try to enable it.
(more data including those above can be obtained by command "netstat -s")

In theory it might reduce amount of errors/retransmissions in case when connection to game server is troublesome for some reason. However this is just a mechanism which is helping to lower the impact of the existing connection issue.

If there are 0 retransmissions, and 0 errors, connection to the game server is working fine and the option can remain disabled.

Also its worth noting that TCP timestamps can be used for information gathering:
https://www.scip.ch/en/?labs.20150305
 

·
Premium Member
Joined
·
6,675 Posts
I checked on a couple of my downstairs computers that have some X540-T1 NICs in them, it looks like the driver supports up to 16 Receive Side Scaling Queues. Id attach a screenshot but this new forum seems to have broken that ability for me. So if the driver supports that many queue's in its dropdown list it would be best to use that amount? I have an 8 thread CPU, so maybe 8 RSS queues would be better?
 

·
Tank destroyer and a god
Joined
·
2,511 Posts
Discussion Starter #20
I checked on a couple of my downstairs computers that have some X540-T1 NICs in them, it looks like the driver supports up to 16 Receive Side Scaling Queues. Id attach a screenshot but this new forum seems to have broken that ability for me. So if the driver supports that many queue's in its dropdown list it would be best to use that amount? I have an 8 thread CPU, so maybe 8 RSS queues would be better?
Depending on CPU threads, or rather cores, I would use RSS to be 2 threads less than is total amount of cores.

Anyway X540 has T1 and T2 variant which is 2 port. These are overkill for normal gaming setups.

These cards have obviously different chips compared to i210-T1, and considering the passive cooler I would expect they are designed either for high and constant data transfers, or the chip has a lot of processing power of its own.

Would be interesting to see if it even needs RSS, and if the ping test to nearest network device will decrease ping time below 120-240 microseconds.
 
1 - 20 of 28 Posts
Top