Overclock.net › Forums › Software, Programming and Coding › Operating Systems › Linux, Unix › [Debian - Wheezy] MDADM Raid 5 - Reading specific file will break array
New Posts  All Forums:Forum Nav:

[Debian - Wheezy] MDADM Raid 5 - Reading specific file will break array

post #1 of 7
Thread Starter 
I have a raid 5 setup with 3x 2tb WD green drives and after filling it up I am having issues with one file that will cause the raid to unsync and break when it is being read. I am not sure how to fix the file to cause it not to break the array.

Are there any suggestion in attempt to find and fix the problem? (Commands I should run?)
Zardoz
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 950 @ 4.21 (HT/On) #3043A793 Rampage III GENE NVIDIA Geforce GTX 560 Ti MSI TFII @ Stock G.SKILL Ripjaws Series 8GB DDR3 1600 
Hard DriveOptical DriveOSMonitor
Intel X25-M SSD 80GB [x2] Raid 0 + Seagate 1TB Pioneer BR-Drive Win7 Ultimate 64bit Acer B273hu 
PowerCase
GE-M800A-d1 Gigabyte LIAN LI PC-A04B mATX 
  hide details  
Reply
Zardoz
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 950 @ 4.21 (HT/On) #3043A793 Rampage III GENE NVIDIA Geforce GTX 560 Ti MSI TFII @ Stock G.SKILL Ripjaws Series 8GB DDR3 1600 
Hard DriveOptical DriveOSMonitor
Intel X25-M SSD 80GB [x2] Raid 0 + Seagate 1TB Pioneer BR-Drive Win7 Ultimate 64bit Acer B273hu 
PowerCase
GE-M800A-d1 Gigabyte LIAN LI PC-A04B mATX 
  hide details  
Reply
post #2 of 7
Well not that I'm an expert in MDADM but it sounds to me like at least one of the drives has a bad sector. That was my first reaction to some kind of read error for one particular file. You probably have to run some kind of check disk from a live distro. Or use my utility of choice, spinrite.
 
VM Server
(17 items)
 
 
CPUGraphicsRAMHard Drive
Intel Ivy Bridge Core i7-3630QM nVidia GeForce GTX 680M 16GB DDR3 1600MHz Dual Channel Memory (2 SODIMMS) Hard Drive: Serial-ATA II 3GB/s 
Hard DriveOSMonitorPower
Hard Drive: Serial-ATA II 3GB/s Windows 10 Pro x64 17.3" FHD 16:9 (1920x1080) Battery: Smart Li-ion Battery (8-Cell) 
Audio
Sound Blaster Compatible 3D Audio 
CPUMotherboardGraphicsRAM
Intel Core i7 860 Biostar T5 XE Radeon HD 5870 Corsair 16GB  
Hard DriveHard DriveOptical DriveOS
Western Digital hard drive wd1001fals-00e8b0 Maxtor 300GB I don't need no stinking optical drive Microsoft Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
HP ZR24w 24'' Samsung SyncMaster 24" logitech wireless k360 Seventeam ST-850ZAF 850W ATX 
CaseMouseAudioAudio
Thermaltake V9 Black Edition Logitech G500 Programmable Gaming Mouse FiiO E7 USB DAC and Portable Headphone Amplifier Sennheiser HD555 Professional Headphones 
  hide details  
Reply
 
VM Server
(17 items)
 
 
CPUGraphicsRAMHard Drive
Intel Ivy Bridge Core i7-3630QM nVidia GeForce GTX 680M 16GB DDR3 1600MHz Dual Channel Memory (2 SODIMMS) Hard Drive: Serial-ATA II 3GB/s 
Hard DriveOSMonitorPower
Hard Drive: Serial-ATA II 3GB/s Windows 10 Pro x64 17.3" FHD 16:9 (1920x1080) Battery: Smart Li-ion Battery (8-Cell) 
Audio
Sound Blaster Compatible 3D Audio 
CPUMotherboardGraphicsRAM
Intel Core i7 860 Biostar T5 XE Radeon HD 5870 Corsair 16GB  
Hard DriveHard DriveOptical DriveOS
Western Digital hard drive wd1001fals-00e8b0 Maxtor 300GB I don't need no stinking optical drive Microsoft Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
HP ZR24w 24'' Samsung SyncMaster 24" logitech wireless k360 Seventeam ST-850ZAF 850W ATX 
CaseMouseAudioAudio
Thermaltake V9 Black Edition Logitech G500 Programmable Gaming Mouse FiiO E7 USB DAC and Portable Headphone Amplifier Sennheiser HD555 Professional Headphones 
  hide details  
Reply
post #3 of 7
Are you getting any SMART errors?
post #4 of 7
Firstly, if you have the available storage make a full backup image of the drives, if not, make a backup of the specific sectors involved (if you know how - I'd have to look it up and test it before advising how to do so)

Secondly, run a full SMART diagnostic test (not just read the smart paramters, use a utility to force a full test)

Thirdly, run FSCK on the affected partition.

If this does not identify or rectify the problem - use DD to READ data directly from the affected sectors, if it still causes the issue - that drive has a failed sector that is failing to mark as failed. If it does not then the filesystem has a corrupted node and needs to be rebuilt - FSCK should have been able to repair this sort of damage, but its not perfect.
    
CPUMotherboardGraphicsRAM
Core i7 920 D0 4.2ghz HT (1.3625v) Asus R3E 2xGTX 460 (non SLi, no overclock) 6x2gb G.skill @ 6-8-6-24-1T 
Hard DriveOptical DriveOSMonitor
WD-VR 300GBx1, 2xWD 1tb,2x60gb Agility Some crappy combo burner... Arch x64 3xDell U2410f rev A02 
KeyboardPowerCaseMouse
X-Armor U9BL TT Toughpower 1200w (NTB more efficient) Mountain Mods Pinnacle 24 CYO Roccat Kone (R.I.P. A4Tech x7) 
Mouse Pad
Steelpad Experience I-1 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
Core i7 920 D0 4.2ghz HT (1.3625v) Asus R3E 2xGTX 460 (non SLi, no overclock) 6x2gb G.skill @ 6-8-6-24-1T 
Hard DriveOptical DriveOSMonitor
WD-VR 300GBx1, 2xWD 1tb,2x60gb Agility Some crappy combo burner... Arch x64 3xDell U2410f rev A02 
KeyboardPowerCaseMouse
X-Armor U9BL TT Toughpower 1200w (NTB more efficient) Mountain Mods Pinnacle 24 CYO Roccat Kone (R.I.P. A4Tech x7) 
Mouse Pad
Steelpad Experience I-1 
  hide details  
Reply
post #5 of 7
Thread Starter 
Code:
anthony@debian:~$ sudo smartctl -a /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EARX-008FB0
Serial Number:    WD-WCAZAJ658373
LU WWN Device Id: 5 0014ee 2b24dff30
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Aug  6 20:06:34 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (30600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x30b5) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   184   182   021    Pre-fail  Always       -       5800
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       69
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4210
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       69
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       68
193 Load_Cycle_Count        0x0032   189   189   000    Old_age   Always       -       34914
194 Temperature_Celsius     0x0022   120   105   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 occurred at disk power-on lifetime: 3668 hours (152 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 08  10d+00:46:01.205  SET FEATURES [Set transfer mode]
  ef 02 00 00 00 00 a0 08  10d+00:46:01.205  SET FEATURES [Enable write cache]
  ef 03 0c 00 00 00 a0 08  10d+00:46:01.205  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08  10d+00:46:01.204  IDENTIFY DEVICE

Error 5 occurred at disk power-on lifetime: 3668 hours (152 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 00 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 02 00 00 00 00 a0 08  10d+00:46:01.205  SET FEATURES [Enable write cache]
  ef 03 0c 00 00 00 a0 08  10d+00:46:01.205  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08  10d+00:46:01.204  IDENTIFY DEVICE

Error 4 occurred at disk power-on lifetime: 3668 hours (152 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 00 00 00 a0 08  10d+00:46:01.205  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08  10d+00:46:01.204  IDENTIFY DEVICE

Error 3 occurred at disk power-on lifetime: 2854 hours (118 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 08  24d+08:47:23.194  SET FEATURES [Set transfer mode]
  ef 02 00 00 00 00 a0 08  24d+08:47:23.194  SET FEATURES [Enable write cache]
  ef 03 0c 00 00 00 a0 08  24d+08:47:23.194  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08  24d+08:47:23.193  IDENTIFY DEVICE

Error 2 occurred at disk power-on lifetime: 2854 hours (118 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 00 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 02 00 00 00 00 a0 08  24d+08:47:23.194  SET FEATURES [Enable write cache]
  ef 03 0c 00 00 00 a0 08  24d+08:47:23.194  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08  24d+08:47:23.193  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

This was the only error reported on the drives in the array.

Also I'm confused about running fsck /dev/md0 as it reports as being in use when it is unmounted

Is there any proper way to monitor its collapse. When it breaks on that file it is easily rebuilt back to its previous state with no data loss using mdadm --assemble --run --force
Edited by anthony92 - 8/6/13 at 3:37am
Zardoz
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 950 @ 4.21 (HT/On) #3043A793 Rampage III GENE NVIDIA Geforce GTX 560 Ti MSI TFII @ Stock G.SKILL Ripjaws Series 8GB DDR3 1600 
Hard DriveOptical DriveOSMonitor
Intel X25-M SSD 80GB [x2] Raid 0 + Seagate 1TB Pioneer BR-Drive Win7 Ultimate 64bit Acer B273hu 
PowerCase
GE-M800A-d1 Gigabyte LIAN LI PC-A04B mATX 
  hide details  
Reply
Zardoz
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 950 @ 4.21 (HT/On) #3043A793 Rampage III GENE NVIDIA Geforce GTX 560 Ti MSI TFII @ Stock G.SKILL Ripjaws Series 8GB DDR3 1600 
Hard DriveOptical DriveOSMonitor
Intel X25-M SSD 80GB [x2] Raid 0 + Seagate 1TB Pioneer BR-Drive Win7 Ultimate 64bit Acer B273hu 
PowerCase
GE-M800A-d1 Gigabyte LIAN LI PC-A04B mATX 
  hide details  
Reply
post #6 of 7
I'm not going to try and interpret that SMART data, but for FSCK I'm still going with the suggestion of booting off a live CD and running it against the hard drives individually from there.
 
VM Server
(17 items)
 
 
CPUGraphicsRAMHard Drive
Intel Ivy Bridge Core i7-3630QM nVidia GeForce GTX 680M 16GB DDR3 1600MHz Dual Channel Memory (2 SODIMMS) Hard Drive: Serial-ATA II 3GB/s 
Hard DriveOSMonitorPower
Hard Drive: Serial-ATA II 3GB/s Windows 10 Pro x64 17.3" FHD 16:9 (1920x1080) Battery: Smart Li-ion Battery (8-Cell) 
Audio
Sound Blaster Compatible 3D Audio 
CPUMotherboardGraphicsRAM
Intel Core i7 860 Biostar T5 XE Radeon HD 5870 Corsair 16GB  
Hard DriveHard DriveOptical DriveOS
Western Digital hard drive wd1001fals-00e8b0 Maxtor 300GB I don't need no stinking optical drive Microsoft Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
HP ZR24w 24'' Samsung SyncMaster 24" logitech wireless k360 Seventeam ST-850ZAF 850W ATX 
CaseMouseAudioAudio
Thermaltake V9 Black Edition Logitech G500 Programmable Gaming Mouse FiiO E7 USB DAC and Portable Headphone Amplifier Sennheiser HD555 Professional Headphones 
  hide details  
Reply
 
VM Server
(17 items)
 
 
CPUGraphicsRAMHard Drive
Intel Ivy Bridge Core i7-3630QM nVidia GeForce GTX 680M 16GB DDR3 1600MHz Dual Channel Memory (2 SODIMMS) Hard Drive: Serial-ATA II 3GB/s 
Hard DriveOSMonitorPower
Hard Drive: Serial-ATA II 3GB/s Windows 10 Pro x64 17.3" FHD 16:9 (1920x1080) Battery: Smart Li-ion Battery (8-Cell) 
Audio
Sound Blaster Compatible 3D Audio 
CPUMotherboardGraphicsRAM
Intel Core i7 860 Biostar T5 XE Radeon HD 5870 Corsair 16GB  
Hard DriveHard DriveOptical DriveOS
Western Digital hard drive wd1001fals-00e8b0 Maxtor 300GB I don't need no stinking optical drive Microsoft Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
HP ZR24w 24'' Samsung SyncMaster 24" logitech wireless k360 Seventeam ST-850ZAF 850W ATX 
CaseMouseAudioAudio
Thermaltake V9 Black Edition Logitech G500 Programmable Gaming Mouse FiiO E7 USB DAC and Portable Headphone Amplifier Sennheiser HD555 Professional Headphones 
  hide details  
Reply
post #7 of 7
That SMART output doesn't look healthy. My guess is you have a faulty drive. I'd recommend doing more testing before buying a replacement, but that SMART output doesn't look healthy.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Linux, Unix
Overclock.net › Forums › Software, Programming and Coding › Operating Systems › Linux, Unix › [Debian - Wheezy] MDADM Raid 5 - Reading specific file will break array