post #1 of 1
Thread Starter 
Okay so I got this somewhat old server from a friend, but it seems to be acting up oddly. It freezes after 2-10 hours, only if a "real" OS is loaded. Memtest86 passes well, and wont freeze in BIOS either.

The original hardware configuration:

Intel S5000PSL mobo
2x Intel Xeon E5410 Quadcore cpu
8x4GB DDR2 ECC 667Mhz ram
12x various SATA2 HDD's of which 6 are from my old server build and 6 are new.
Areca ARC-1120 SATA controller card
2x PCI-E gigabit ethernet cards (one brand new, one from old server)

What has happened so far in chronological order:
- set up the configuration as told above
- went through bios and adjusted some settings
- ran memtest86, no errors
- installed latest Debian (without graphical environment etc), no problems
- booted new installation, did some normal server configuring, installed some software etc, no problems
- after few hours noticed the system was completely frozen, did a hard reset
- same as above repeated twice (at this point I started running watch -n1 "uptime;sensors" over SSH from another computer to see the exact moment of freeze (uptimes were 1 hour 48 minutes and 2 hours 13 minutes), and temperatures, which were all normal)
- ran prime95 for a bit over a hour, all ok (had to turn it off after a hour because the fan noise started to annoy the hell outta me), and about a hour later (about 2 hour uptime), the system froze once again
- booted a Xubuntu livecd (usb) and left it idling, few hours and it froze
- left the machine idling in bios and went to sleep, woke up in the morning and still just normally in bios
- ran memtest again, no errors, left it running for about 6 hours (multiple passes)
- removed the Areca ARC-1120 sata controller card (and the 6 new HDD's that were attached to it, leaving only the 6 HDD's from my old server attached, of which all are confirmed to be working)
- booted up the system and left it idiling, this time it managed to stay up for 9 hours and 55 minutes
- next I removed the two pci-e network cards I had installed and this is where I am now, waiting for the system to freeze
- restarted to a DOS usb stick to read the syslogs of the BMC, found entries "OEM Reserved /SMI Timeout (#0x85)" which happened at same time when the system froze, every time
- rebooted back into linux, installed bmc-watchdog and configured it to reset a 15-minute SMI timer to hard reset the system, every 60 seconds
- currently just waiting for strange things to happen again..

So it seems that the freeze was happening constantly after about 2 hours of uptime when the Areca card was present, and now without it almost 10 hours, so it seems the removal of the Areca card had some effect, but I cant figure out what could be the problem, since it seems to be only a "normal OS" that causes a full system freeze, and tried two different OS'es, from two different locations (booting from HDD and USB). During all this, I have also tried changing some bios settings with no differences and reset to bios defaults too.
Edited by Microx256 - 12/30/13 at 3:03am
Minimachine
(9 items)
 
  
CPUMotherboardGraphicsRAM
i7-5775C Asus Maximus VII Impact EVGA GTX1080 Founders edition 2x8GB Kingston HyperX 1600Mhz 
Hard DriveMonitorPowerCase
Samsung 950 PRO 512GB M.2 NVMe 3x Asus PB287Q 4k Silverstone 600W Strider SFX NCASE M1 Rev 2 
  hide details  
Reply
Minimachine
(9 items)
 
  
CPUMotherboardGraphicsRAM
i7-5775C Asus Maximus VII Impact EVGA GTX1080 Founders edition 2x8GB Kingston HyperX 1600Mhz 
Hard DriveMonitorPowerCase
Samsung 950 PRO 512GB M.2 NVMe 3x Asus PB287Q 4k Silverstone 600W Strider SFX NCASE M1 Rev 2 
  hide details  
Reply