Overclock.net › Forums › Specialty Builds › Servers › Server Wide Corruption and Missing Files *HELP*
New Posts  All Forums:Forum Nav:

Server Wide Corruption and Missing Files *HELP*

post #1 of 8
Thread Starter 
Ok, so our IT guys are horrible and weve had hardware failure and I am turning to OCN for some help to get the problem fixed.
I can write a story on what happened but will summarize the problems we had and continue to face as best as I can.

At work we have a small businnes HP server with 2 Hard Drives in Raid. Well I am not 100% what part malfunctioned, but it seems like either a Hard Drive failed or the Raid controller failed.

The server was acting slow, taking forever for the files to load. We called our IT guys and they agreed there was a problem. It seemed the hard drive failed and started writing a corrupt file on our exchange server, so our mail system went down. They fixed it and then we notices any files in our system that were created on 1/31 and 2/01 were corrupt (this being a week later). We had a good back up and was restoring files. We ended up having a hard drive replaced. When finished building (coping the other hard drives info to the new hard drive) it crashed. This caused files across the whole server to go missing, damaged and corrupt.

We had every part in our server that could be malfunctioning replaced. It has been a month since the start of our problems. We are now facing missing, corrupt and damaged files. We have several back ups from before it started, during the problems and very recent.

However, our IT guys cant figure a way to get all the files that are missing back on the server,as well as fix/detect any damaged/corrupt files and still manage not to write over any work we recently did.

They said they ran some problem to detect corrupt files and they sould be fixed but I am still finding them. There are thousands of files on our server that span 4 years of work. There are several people that could have there hand in anyone of these files at one point in the last month.

Whats even worse is I am the lowest paid employee at the company yet I am the one handling this problem.. Go figure..

Any advice other than get new IT guys is appreciated! Any advice that I could feed them would be appreciated as well. They seem stumped.. And having one of there employees comb the server for problems doenst seem like an option they are willing to explore..

Thanks!
post #2 of 8
Well I don't know everything there is to know about production server stuff but I can almost always ask the right questions...

  • What version of server? (2003? 2008?)
  • You didn't say so I assume this is not a server in a VM?
  • Do you have a fall back server you can use while exploring options for this corrupted one? You didn't mention it so i assume not...
  • I assume anti-virus has come back negative?

If you wanted to narrow down the issue between the drives and the raid card I swear by SpinRite. It's DOS-based though, so you'd have to connect the drives to non-raid ports so the boot disc would see them. And the server would be down for an indeterminate time obviously. Thus the fall back question.

Think certain patches can conflict with certain pieces of software. Is the server completely up to date? Has any new software been installed on the server recently? How about driver updates?

Sorry all i can do is ask questions rather offer answers smile.gif
 
VM Server
(17 items)
 
 
CPUGraphicsRAMHard Drive
Intel Ivy Bridge Core i7-3630QM nVidia GeForce GTX 680M 16GB DDR3 1600MHz Dual Channel Memory (2 SODIMMS) Hard Drive: Serial-ATA II 3GB/s 
Hard DriveOSMonitorPower
Hard Drive: Serial-ATA II 3GB/s Windows 10 Pro x64 17.3" FHD 16:9 (1920x1080) Battery: Smart Li-ion Battery (8-Cell) 
Audio
Sound Blaster Compatible 3D Audio 
CPUMotherboardGraphicsRAM
Intel Core i7 860 Biostar T5 XE Radeon HD 5870 Corsair 16GB  
Hard DriveHard DriveOptical DriveOS
Western Digital hard drive wd1001fals-00e8b0 Maxtor 300GB I don't need no stinking optical drive Microsoft Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
HP ZR24w 24'' Samsung SyncMaster 24" logitech wireless k360 Seventeam ST-850ZAF 850W ATX 
CaseMouseAudioAudio
Thermaltake V9 Black Edition Logitech G500 Programmable Gaming Mouse FiiO E7 USB DAC and Portable Headphone Amplifier Sennheiser HD555 Professional Headphones 
  hide details  
Reply
 
VM Server
(17 items)
 
 
CPUGraphicsRAMHard Drive
Intel Ivy Bridge Core i7-3630QM nVidia GeForce GTX 680M 16GB DDR3 1600MHz Dual Channel Memory (2 SODIMMS) Hard Drive: Serial-ATA II 3GB/s 
Hard DriveOSMonitorPower
Hard Drive: Serial-ATA II 3GB/s Windows 10 Pro x64 17.3" FHD 16:9 (1920x1080) Battery: Smart Li-ion Battery (8-Cell) 
Audio
Sound Blaster Compatible 3D Audio 
CPUMotherboardGraphicsRAM
Intel Core i7 860 Biostar T5 XE Radeon HD 5870 Corsair 16GB  
Hard DriveHard DriveOptical DriveOS
Western Digital hard drive wd1001fals-00e8b0 Maxtor 300GB I don't need no stinking optical drive Microsoft Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
HP ZR24w 24'' Samsung SyncMaster 24" logitech wireless k360 Seventeam ST-850ZAF 850W ATX 
CaseMouseAudioAudio
Thermaltake V9 Black Edition Logitech G500 Programmable Gaming Mouse FiiO E7 USB DAC and Portable Headphone Amplifier Sennheiser HD555 Professional Headphones 
  hide details  
Reply
post #3 of 8
How reliable is the backup? At this point in time, I would recommend buying/building a new server, and restoring the backup there. The raid still doesn't sound like its fixed if files are getting overwritten. Unfortunately it sounds like you are going to have data loss. Who knows how long the data corruption has been going on, and corrupt files may have been backed up, possibly overwriting good files. There are a number of data recovery programs out there, but recovery from raid is not always straight forward.
Chronos
(14 items)
 
Home Server
(8 items)
 
 
CPUMotherboardGraphicsRAM
i7 2600k Intel DZ68BC Nvidia GTX 970 Samsung MV-3V4G3D/US 
Hard DriveHard DriveOptical DriveCooling
Crucial M4 Seagate ST3320620AS LG DVD burner Corsair H105 
OSMonitorKeyboardPower
Microsoft Windows 7 Ultimate Samsung SyncMaster 226BW Logitec Backlit Antec TruePower 750w Blue 
CaseMouse
Corsair 550D Logitech 
CPUMotherboardGraphicsRAM
Xeon E3 1230 Supermicro MBD-X9SCI-LN4-O Onboard Matrox G200eW Kingston KVR1333D3E9S/4G 
Hard DriveOSPowerCase
Seagate 500gb  Esxi 5 Corsair CX430 Lian Li PC-A05FNB 
  hide details  
Reply
Chronos
(14 items)
 
Home Server
(8 items)
 
 
CPUMotherboardGraphicsRAM
i7 2600k Intel DZ68BC Nvidia GTX 970 Samsung MV-3V4G3D/US 
Hard DriveHard DriveOptical DriveCooling
Crucial M4 Seagate ST3320620AS LG DVD burner Corsair H105 
OSMonitorKeyboardPower
Microsoft Windows 7 Ultimate Samsung SyncMaster 226BW Logitec Backlit Antec TruePower 750w Blue 
CaseMouse
Corsair 550D Logitech 
CPUMotherboardGraphicsRAM
Xeon E3 1230 Supermicro MBD-X9SCI-LN4-O Onboard Matrox G200eW Kingston KVR1333D3E9S/4G 
Hard DriveOSPowerCase
Seagate 500gb  Esxi 5 Corsair CX430 Lian Li PC-A05FNB 
  hide details  
Reply
post #4 of 8
Thread Starter 
Quote:
Originally Posted by subassy View Post

Well I don't know everything there is to know about production server stuff but I can almost always ask the right questions...
  • What version of server? (2003? 2008?)
  • You didn't say so I assume this is not a server in a VM?
  • Do you have a fall back server you can use while exploring options for this corrupted one? You didn't mention it so i assume not...
  • I assume anti-virus has come back negative?
If you wanted to narrow down the issue between the drives and the raid card I swear by SpinRite. It's DOS-based though, so you'd have to connect the drives to non-raid ports so the boot disc would see them. And the server would be down for an indeterminate time obviously. Thus the fall back question.
Think certain patches can conflict with certain pieces of software. Is the server completely up to date? Has any new software been installed on the server recently? How about driver updates?
Sorry all i can do is ask questions rather offer answers smile.gif

2008 Server
Windows, no VM
We have several back ups. From before the corruption happened, during, and after.
We run anti-virus all the time and each computer has it as well as the server.

Just to clarify the file problems started and happened and stopped happening all at the exact same time. When hard drive A was writting all the information to hard drive B (new) since there in raid format. It either crashed at is was finishing or crashed after it finished. Files arent continuing to corrupt or go missing, it is simply that we are just now discovering how extensive the problem is

I would love to say lets take the back up from 2 months ago and just work from there. But we are talking about a company of a dozen employees all working 40+ hrs a week. The amount of money lost by losing the last 2 months would probably bankrupt the company. We are talking work done on million dollar projects. SO what I am figuring is finding a way to restore only lost files and detecting corrupt files or isolating any new work that was done and then restoring an older back up to replace what was on the server prior to 2 months ago, then just going through the new stuff.

Sadly, there is many directories on the server which branch off into hundreds of folders and thousands of files.. Since this problem has started our IT guys having been taking daily or every other day back ups. So we should have all the files some where or another, its just a matter of getting them all into one place.

As for building a new server.. I dont see the point, since all the parts were just replaced by HP Techs and our IT guys as bad as they are says the parts are working and files arent continuing to corrupt, its just what has already happened that is the problem.

Is there a simply way of fixing this sort of problem? Our IT guys say its the first of its kind that they have had to deal with and are slow to react.. It is just kipping the company to keep finding problems across the whole server.. You never know what file you need and then when a client called and you need to get into it and cant, its just not good..

Thanks guys!
post #5 of 8
"Sadly, there is many directories on the server which branch off into hundreds of folders and thousands of files.. Since this problem has started our IT guys having been taking daily or every other day back ups. So we should have all the files some where or another, its just a matter of getting them all into one place." to be honest you shoulld have been getting daily backups anyway if the IT guys were good

first time ive seen a problem like this without a recent backup as thats how most people would just get the data back and only lose what was done in the last day or so
Ragnarok
(15 items)
 
Yggdrassil
(8 items)
 
 
CPUCPUMotherboardGraphics
Opteron 6272 Opteron 6272 Asus KGPE-D16 Sapphire R9 290X 
GraphicsRAMHard DriveCooling
Sapphire R9 290X 8x Samsung 8GB DDR3 1333 ECC 3TBx2 in Raid 1 Noctua NH-U12DO 
CoolingOSMonitorPower
120mm Fractal Design 1200rpm Case Fan Silent Se... Windows 7 Pro x64 1x24" 2x22" Corsair Professional HX1050 
CaseAudioOther
Fractal Design XL R2 Titanium Grey Asus MIO 892 Purple and white Braided/sleeved cables 
  hide details  
Reply
Ragnarok
(15 items)
 
Yggdrassil
(8 items)
 
 
CPUCPUMotherboardGraphics
Opteron 6272 Opteron 6272 Asus KGPE-D16 Sapphire R9 290X 
GraphicsRAMHard DriveCooling
Sapphire R9 290X 8x Samsung 8GB DDR3 1333 ECC 3TBx2 in Raid 1 Noctua NH-U12DO 
CoolingOSMonitorPower
120mm Fractal Design 1200rpm Case Fan Silent Se... Windows 7 Pro x64 1x24" 2x22" Corsair Professional HX1050 
CaseAudioOther
Fractal Design XL R2 Titanium Grey Asus MIO 892 Purple and white Braided/sleeved cables 
  hide details  
Reply
post #6 of 8
Thread Starter 
They werent quick and they didnt address the problem when it happened.. They just let it go and grow into a bigger one..

As I said.. there not good IT guys..
post #7 of 8
tldr
Have you tried using the SmartStart Software from HP to run a maintenence/diagnostics scan on the failing machine?


NVM I see you are not having hardware problems anymore. Unfortunately they are going to have to seed though each file/folder piece by piece and retrieve them from good copies as needed. Part of being in IT and being crappy at your job!
Edited by Deeeebs - 3/15/12 at 8:01am
Red Anarchy
(15 items)
 
Commander Herbie
(15 items)
 
BlackBox
(18 items)
 
CPUMotherboardGraphicsRAM
Intel Xeon X5675 @ 4.2/1.352v Asus x58 Sabertooth NVIDIA Quadro FX4800 12GB Mushkin Radioactive 9-9-9-24 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 32GB SSD (ESXi + ISO Images) Western Digital 250GB 7200 Sata (VMs) 3 x 750GB Segate Con ES Raid 5 none 
OSMonitorKeyboardPower
VMware ESXi 5.0 Dual Dell 17" Flat panel Cheap Logitech Seasonic X750 Gold 
CaseMouseMouse Pad
Antec 900 Cheap Logitech HP Blackbird 002 Gaming Pad 
  hide details  
Reply
Red Anarchy
(15 items)
 
Commander Herbie
(15 items)
 
BlackBox
(18 items)
 
CPUMotherboardGraphicsRAM
Intel Xeon X5675 @ 4.2/1.352v Asus x58 Sabertooth NVIDIA Quadro FX4800 12GB Mushkin Radioactive 9-9-9-24 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 32GB SSD (ESXi + ISO Images) Western Digital 250GB 7200 Sata (VMs) 3 x 750GB Segate Con ES Raid 5 none 
OSMonitorKeyboardPower
VMware ESXi 5.0 Dual Dell 17" Flat panel Cheap Logitech Seasonic X750 Gold 
CaseMouseMouse Pad
Antec 900 Cheap Logitech HP Blackbird 002 Gaming Pad 
  hide details  
Reply
post #8 of 8
Thread Starter 
Good luck telling them that.. Whats even worse is my boss and the rest of the staff dont even bother to call them to tell them theres issue.. I just have to hear them complain about it day after day until I call them and try to have them fix it. (Totally not my job, I asked my boss for a raise yesterday and he said we will talk about it later.. I think later is after I quit) <- Sorry to rant.

Well guys thank you all very much for the advice. I think its beyond repair at this point unless someone sinks hours of time trying to fix it.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Servers
Overclock.net › Forums › Specialty Builds › Servers › Server Wide Corruption and Missing Files *HELP*