Overclock.net › Forums › Software, Programming and Coding › Networking & Security › Strange connection dropout on a server
New Posts  All Forums:Forum Nav:

Strange connection dropout on a server

post #1 of 20
Thread Starter 
Hey guys,

Im pretty much out of ideas here, so lets see if we can figure this out

So, I have a server running Win Server 2003r2, IP 192.168.1.2, mask 255.255.240.0

Then a few computers in 192.168.1.0 and few more in 192.168.2.0 and 192.168.3.0, all with 255.255.240.0 mask.

The server does not respond to any network trafic from certain hosts in 2.0 and 3.0 every now and then. Weird issue is, it will start not responding to one machine, while responding to everything else on the network. This lasts for about 30sec-5minutes, and then fixes itself. Then it picks another machine and does the same thing.

This only happends from 2.0 and 3.0 blocks of the subnet, everything from 1.0 responds constantly.

To me this seems more like a software problem on the server rather than hardware, but .... why is it doing this.... I went around every windows server setting I know that could have something to do with this, no luck.
Edited by tomaskir - 1/17/11 at 5:18am
post #2 of 20
Quote:
Originally Posted by tomaskir View Post
Hey guys,

Im pretty much out of ideas here, so lets see if we can figure this out

So, I have a server running Win Server 2003r2, IP 192.168.1.2, mask 255.255.240.0

Then a few computers in 192.168.1.0 and few more in 192.168.2.0, all with 255.255.240.0 mask.

The server does not respond to any network trafic from certain hosts in 2.0 every now and then. Weird issue is, it will start not responding to one machine, while responding to everything else on the network. This lasts for about 30sec-5minutes, and then fixes itself. Then it picks another machine and does the same thing.

This only happends from 2.0 subnet, everything from 1.0 responds constantly.

To me this seems more like the server software problem than hardware, but .... why is it doing this.... I went around every windows setting I know that could have something to do with this, no luck.
What device do you have connecting the 192.168.1.0/28 and 192.168.2.0/28 networks...?

Alternatively, if every device only really needs to access the server, you can just add a second IP address to the server NIC, so you'll have an IP address on both subnets.
Edited by ComGuards - 1/16/11 at 7:54pm
ESXi Host 1
(15 items)
 
  
CPUMotherboardGraphicsRAM
(2x) Intel Xeon E5520 Dell OnBoard Matrox G200 24GB DDR3 12x2GB UDIMMS (18 slots total) 
Hard DriveHard DriveHard DriveHard Drive
PERC6-RAID50 Intel 730 480GB Intel 320 300GB Synology DS414 iSCSI SAN 
OSMonitorKeyboardPower
VMWare vSphere5 Enterprise Plus Dell iDRAC6 Remote Management [KVM-Over-IP] Dell iDRAC6 KVM Dell Hot-Swap Redundant 1100W 
CaseMouse
Dell PowerEdge T710 Stock Dell iDRAC6 KVM 
  hide details  
Reply
ESXi Host 1
(15 items)
 
  
CPUMotherboardGraphicsRAM
(2x) Intel Xeon E5520 Dell OnBoard Matrox G200 24GB DDR3 12x2GB UDIMMS (18 slots total) 
Hard DriveHard DriveHard DriveHard Drive
PERC6-RAID50 Intel 730 480GB Intel 320 300GB Synology DS414 iSCSI SAN 
OSMonitorKeyboardPower
VMWare vSphere5 Enterprise Plus Dell iDRAC6 Remote Management [KVM-Over-IP] Dell iDRAC6 KVM Dell Hot-Swap Redundant 1100W 
CaseMouse
Dell PowerEdge T710 Stock Dell iDRAC6 KVM 
  hide details  
Reply
post #3 of 20
Thread Starter 
Quote:
Originally Posted by ComGuards View Post
What device do you have connecting the 192.168.1.0/28 and 192.168.2.0/28 networks...?

Alternatively, if every device only really needs to access the server, you can just add a second IP address to the server NIC, so you'll have an IP address on both subnets.
The server is connected straight to a 24p 3com switch. The client machines connect either directly to the same switch or through another switch connected to this one.

I could switch the server to 192.168.2.0/20 but i dont think that would solve much, not to mention some client PCs connect to its IP instead of its DNS name, so i would have to make sure clients are working.

The problem is still persisting, the machine next to me just dropped its connection after working fine for last hour, and then fixed itself in a matter of 5 minutes. It couldnt connect to the server at all, it didnt repond to ping, couldnt get to its SMB share, couldnt connect to the SQL database that its running .... complete denial of connection. Couldnt ping the machine from the server either.
Edited by tomaskir - 1/17/11 at 3:21am
post #4 of 20
- Is this at work at at home?
- What model 3com is that?
- How many hosts?
- You realize this subnet is way to big right? I assume this is just for testing. 4000 + hosts
- Are your ports hard coded for 100/full or auto/ auto? Both sides of the link switch and hosts.
1090T
(13 items)
 
  
CPUMotherboardGraphicsRAM
1090T GA-890FXA-UD5 HIS 4670 G.SKILL ECO Series 4GB (1600) 
Hard DriveOSPowerCase
WD Black (Raid 0) Win 7 Home Premium x64 CORSAIR 850W COOLER MASTER Storm Sniper 
  hide details  
Reply
1090T
(13 items)
 
  
CPUMotherboardGraphicsRAM
1090T GA-890FXA-UD5 HIS 4670 G.SKILL ECO Series 4GB (1600) 
Hard DriveOSPowerCase
WD Black (Raid 0) Win 7 Home Premium x64 CORSAIR 850W COOLER MASTER Storm Sniper 
  hide details  
Reply
post #5 of 20
Thread Starter 
Quote:
Originally Posted by Thorn-Blade View Post
- Is this at work at at home?
- What model 3com is that?
- How many hosts?
- You realize this subnet is way to big right? I assume this is just for testing. 4000 + hosts
- Are your ports hard coded for 100/full or auto/ auto? Both sides of the link switch and hosts.
Its at work. My company recently took over the managment of this network from the people that manged it before, they were BAD at this, therefore the network is in such a mess as it is, trying to get it to a normal running status.

I realize OCN is not the best forum for this kind of support, but hey

About 10 hosts in 192.168.1.0 - server directly to 3com, rest through 16p switch.
About 50 hosts in 192.168.2.0 - on the 3com either directly or through minor switches (about 8x 8port switches)
About 20 hosts in 192.168.3.0 - on the 3com through minor switches (about 4x 8port switches)
About 20 hosts in 192.168.7.0 - all on a separate switch Zyxel 1524, dont communitate with the server except DHCP, DNS, communicate with 2.0 and between 7.0
About 15 hosts in 192.168.9.0 - all on a separate switch Zyxel 1524, dont communitate with the server except DHCP, DNS, communicate with 7.0 and between 9.0

I realize that /20 is probably quite a big subnet for this, but its mostly for organizational purposes, also, there isnt that much broadcast trafic on the network. 99% of the communication on the network is either hosts to this server (SQL database, SMB share), or hosts between each other (SMB share, FTP, etc.)

I could switch it over to /21, but I personally dont think the subnet size is the issue here, feel free to correct me tho It is organized this way because i have some routing rules set on the router which routes each block of the subnet to a different WAN IP. (ex. 192.168.1.0 uses a different world IP for accessing the internet than 192.168.2.0)

All ports are auto/auto, the connections between switches are running 1Gbit, rest of the network is either 100Mbit or 1Gbit.

EDIT: Actually, which forums would provide a good networking support in a case like this?
Edited by tomaskir - 1/17/11 at 5:14am
post #6 of 20
As far as OCN goes, this is the best forum for it.

They used a very bad design to set that network up as you know. This makes no since at all. Even though you have a /20 it appears it is all one subnet, and you should have one router or one layer 3 switch doing all your routing. This would also mean everyone is using the same subnet mask and gateway address.

Can you give us a high level drawing including where you hand off to your ISP?
Model numbers of switches?

I wonder if one of these switches is having issues dealing with the /20. Are all of these higher level managed switches?

Hang on a minute... "I could switch it over to /21, but I personally dont think the subnet size is the issue here, feel free to correct me tho It is organized this way because i have some routing rules set on the router which routes each block of the subnet to a different WAN IP. (ex. 192.168.1.0 uses a different world IP for accessing the internet than 192.168.2.0)"

How are you doing this? The reason I ask is this is all one subnet. You don't route just part of a subnet.

192.168.1.0 / 20 = 192.16.0.0 - 192.16.15.255 <--- this covers all the address you have listed.

You are correct in a /21 wouldn't help either. They should have setup much smaller subnets such as /25s, / 26s, etc... You would then be able to route the whole subnet to the WAN as you want.
1090T
(13 items)
 
  
CPUMotherboardGraphicsRAM
1090T GA-890FXA-UD5 HIS 4670 G.SKILL ECO Series 4GB (1600) 
Hard DriveOSPowerCase
WD Black (Raid 0) Win 7 Home Premium x64 CORSAIR 850W COOLER MASTER Storm Sniper 
  hide details  
Reply
1090T
(13 items)
 
  
CPUMotherboardGraphicsRAM
1090T GA-890FXA-UD5 HIS 4670 G.SKILL ECO Series 4GB (1600) 
Hard DriveOSPowerCase
WD Black (Raid 0) Win 7 Home Premium x64 CORSAIR 850W COOLER MASTER Storm Sniper 
  hide details  
Reply
post #7 of 20
I have seen your problem before now that I think about it. I have seen it a few times actually. The both came down to duplicate IPs on the network. I have seen it with PCs having the same IP, and with a router that was left turned on that should have been removed.

Check the following:
1. Make sure none of the mac tables have the same mac address duplicated. This could be caused by the switch not flushing a bad mac entry in the table.
2. Check the router to make sure it does not have any duplicate ARP entries.
3. Make sure you do not have any duplicate IPs on your network. Some routers and layer 3 switches will have an entry in the log telling you this.
1090T
(13 items)
 
  
CPUMotherboardGraphicsRAM
1090T GA-890FXA-UD5 HIS 4670 G.SKILL ECO Series 4GB (1600) 
Hard DriveOSPowerCase
WD Black (Raid 0) Win 7 Home Premium x64 CORSAIR 850W COOLER MASTER Storm Sniper 
  hide details  
Reply
1090T
(13 items)
 
  
CPUMotherboardGraphicsRAM
1090T GA-890FXA-UD5 HIS 4670 G.SKILL ECO Series 4GB (1600) 
Hard DriveOSPowerCase
WD Black (Raid 0) Win 7 Home Premium x64 CORSAIR 850W COOLER MASTER Storm Sniper 
  hide details  
Reply
post #8 of 20
Thread Starter 
Quote:
Originally Posted by Thorn-Blade View Post
As far as OCN goes, this is the best forum for it.

They used a very bad design to set that network up as you know. This makes no since at all. Even though you have a /20 it appears it is all one subnet, and you should have one router or one layer 3 switch doing all your routing. This would also mean everyone is using the same subnet mask and gateway address.

Can you give us a high level drawing including where you hand off to your ISP?
Model numbers of switches?

I wonder if one of these switches is having issues dealing with the /20. Are all of these higher level managed switches?
Yes, only the 1.0->15.0 of the first subnet is used. So each PC in the network is using the same router at 192.168.1.100. Everything in the network has 255.255.240.0 mask. There is only the one router in the network which takes internet connect from ISP and routes it.

Router is at 192.168.1.100 with 255.255.240.0 mask.

Quote:
Originally Posted by Thorn-Blade View Post
Hang on a minute... "I could switch it over to /21, but I personally dont think the subnet size is the issue here, feel free to correct me tho It is organized this way because i have some routing rules set on the router which routes each block of the subnet to a different WAN IP. (ex. 192.168.1.0 uses a different world IP for accessing the internet than 192.168.2.0)"

How are you doing this? The reason I ask is this is all one subnet. You don't route just part of a subnet.

192.168.1.0 / 20 = 192.16.0.0 - 192.16.15.255 <--- this covers all the address you have listed.

You are correct in a /21 wouldn't help either. They should have setup much smaller subnets such as /25s, / 26s, etc... You would then be able to route the whole subnet to the WAN as you want.
Actually that is pretty easy. I router certain blocks of this huge subnet to different WAN IPs:

192.168.1.0->192.168.1.255 is routed to go out thou WAN xx.xxx.xx.18
192.168.2.0->192.168.2.255 is routed to go out thou WAN xx.xxx.xx.19
192.168.3.0->192.168.3.255 is routed to go out thou WAN xx.xxx.xx.20
192.168.7.0->192.168.4.255 is routed to go out thou WAN xx.xxx.xx.21
192.168.9.0->192.168.9.255 is also routed to go out thou WAN xx.xxx.xx.21

Using WAN xx.xxx.xx.22 for incoming connections only.

This is all done on the routing rules in the router ofc, router takes 5 WAN IPs from the ISP. Mind you, this network was running on a SOHO router before, without a firewall or anything. First thing I did was throw that out and add a solid router/firewall.

I was thinking of re-setting the whole network to multiple /24s, the problem is that would fragment the network in its current state, as I mentioned before, because right now all is in one huge subnet, but then to keep it organized as it is, i would have to have 5x /24 subnets and hosts comunicating over subnets. Right now its one huge subnet with hosts organized in different blocks of the single subnet.

Hopefully this all makes sense, i will make a network diagram later today and post it.
Edited by tomaskir - 1/17/11 at 6:06am
post #9 of 20
Thread Starter 
Quote:
Originally Posted by Thorn-Blade View Post
I have seen your problem before now that I think about it. I have seen it a few times actually. The both came down to duplicate IPs on the network. I have seen it with PCs having the same IP, and with a router that was left turned on that should have been removed.

Check the following:
1. Make sure none of the mac tables have the same mac address duplicated. This could be caused by the switch not flushing a bad mac entry in the table.
2. Check the router to make sure it does not have any duplicate ARP entries.
3. Make sure you do not have any duplicate IPs on your network. Some routers and layer 3 switches will have an entry in the log telling you this.
100% sure i dont have duplicate IPs. I will chek ARP tables on router and the server to see if i have any MAC duplication problems.
post #10 of 20
Not to be a stickler, but a 255.255.240.0 mask on a class C IP range? Im pretty sure a class C is defined and limited to 254 hosts per subnet. If you need a bigger subnet, wouldnt you bump up to a class B ip range.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Networking & Security
Overclock.net › Forums › Software, Programming and Coding › Networking & Security › Strange connection dropout on a server