Overclock.net › Forums › Specialty Builds › Servers › DATA: A discussion of redundancy with questions
New Posts  All Forums:Forum Nav:

DATA: A discussion of redundancy with questions

post #1 of 16
Thread Starter 
I'm sticking this here as most of my questions seem somewhat server related.

I’ve been doing very great amounts of soul searching regarding my needs for data storage, organization, backup, etc. Please allow me to preface this by stating that the following is the result of my assessment of my own personal situation and needs. I suspect, though am not sure, that this would be similar to any normal person or household, but not necessarily to large businesses, corporations, etc. I will be following my thoughts and observations here with some questions. Most of this stems from the types of data I have the separate issues each possess in terms of having safe, protected, un-corrupted backups. I’d like to say at this time I’d prefer to avoid RAID if possible (for a variety of reasons including lack of experience, money, and personal needs) as I think most of my needs can be relatively easily met using robocopy, xcopy, or similar. I’d like to stick with Windows here as a variety of needs (Lightroom, photoshop, netflix, other programs) prevent my use of linux.

While I realize this my be long and result in some “tl&drâ€, I hope you can take the time to read this and share your insights, thoughts, etc. If you want to tl&dr but still get a summary, see <here> at the end.

Data: Categories

After a good deal of head scratching I have come to the conclusion that there are only a few types of data. This, it seems to me, applies especially to the personal or household level.

Archive-able
First, there is what I am calling “archive-able†data. This data is no longer current and is not edited. This data does not change. It is saved for reference, posterity, or by some hording complex that is beyond the scope of the discussion. Ignoring the horded stuff, some data may be necessary for reference but is not routinely used. Your resume, past letters and documents, and the last 5 years tax statements are what I am considering “archive-ableâ€. You may need to have immediate access to these some time, but you’re unlikely to edit them. If you do feel the need, say to update you resume, you edit and save as a new copy with a current name.

Back up:
I think backing up this data is perhaps the easiest. Have three copies (one of which, or a fourth, is off-site) and your good. Since this data never changes without corruption –as you don’t edit this ever- don’t mirror this and you’ll always have at least a couple copies that are safe and un-corrupted. {*always is obviously a relative term}

Active
This leads us to the second category: “Active†data. This is data you are likely to need/use in your present life. It is current You will potentially edit this today or in the near future. This data may change. The current year’s tax documents, the resume while you’re revising it, this years email, and this semester’s homework are all examples of “active†data, in my opinion.

Back up:
Backing up this data safely presents me with the greatest conundrum (confusion). As the data is edited; mirroring changes, purging unwanted, and adding new files is critical. At the same time, this presents the opportunity for accidental deletion or silent corruption to compromise every instance of this data in existence. Using command syntax – like robocopy or wrappered versions like SyncToy- and RAID protects against drive/media failure but not against corruption. At least not in an easy way. Apparently some file systems like ZFS through unRAID or FlexRaid may offer corruption protection but I do not understand those systems. Additionally, some of these systems are unavailable for Windows users, other like FlexRAID are available but require learning a new language/syntax/etc.

Media
This third category likely could fall under “archive-able†but for what I intuitively think are obvious reasons, it has its own category. This category includes music and video. This, for my needs, does not include personal photos, music, or “home videoâ€. You never, ever edit media. You may add new files, maybe delete unwanted files (rarely) but you never edit them.

Back up:
Backing up “Media†seems identical to “Archive-able†data. You make the first copy and validate, then copy this to two or more separate sources. You don’t mirror this, you provide sufficient redundancy, and you protected against almost everything.

Personal Media

This is a category which may be unique to some individuals. “Personal Media†includes your personal photos, home videos, recordings, etc. This category is for people like me who capture and edit/manipulate large quantities of this media. This category is not for people who take casual snapshots and maybe later look them over. That media is “archive-ableâ€.

My personal media contains several hundred Gigabytes of data. 100-200Gb will likely be added each year. This data can or will be edited and manipulated, sorted, deleted, or otherwise changed. If it is made into a slide show or album, that output is archived, but the RAW data is always pseudo-current.

Back up:
Issues here are similar to “active†back up problems. To efficiently back up, protect, and sort/organize the data, it must be in some way mirrored. Unwanted files must be purged, etc. Mirroring leaves room for silent corruption. This seems like the hardest issue to deal with reasonably.
Edited by ddietz - 12/27/10 at 8:23pm
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
post #2 of 16
Thread Starter 
My Situation

I suspect that any data management solution requires some tweaking at the personal level. As such, allow me to provide my situation.

In my household there are (or will be) three computers: the wife’s laptop, my desktop, and an HTPC (with extra backup storage). These are all connected by a wired 10/100 LAN. Adding other personal computers to this simply increases the amount of needed and available storage.

My wife’s laptop contains archive-able and current personal documents, her music collection (50Gb and growing – she is a professional musician) and ‘archive-able†personal photos. Personal documents are largely text based. Today total volume is ~80Gb, this will grow slowly.

My desktop contains archive-able and active personal text/PDF documents and music. Around 40Gb of which 15Gb is music. I also have program downloads, Steam game backups, extractions, etc (my crappy internet connection prevents me from easily re-downloading these, plus some older programs are hard to find). This data is archive-able. I have ~200Gb of photos, adding an additional ~50-100Gb a year, as already described. My computer also handles the conversion of video media (DVD/Bluray) to HDD, from our disk collection. These are copies of what we own (we have no cable/dish TV so we treat ourselves to movies and receive them as gifts at holidays).

The HTPC (currently under construction) will hold the working copies of all video media. It will either be ripped by the HTPC or by my desktop and copied to it via LAN. Video backup will result in copies on my desktop hdds, the HTPC and an external. We have the disks but the shear hours to re-build the collection…..not if I can help it.

Our individual music collections are managed through iTunes, which syncs to various iDevices. It seems the easiest solution is to maintain our separate working “libraries†on our individual computers and use non-mirrored, redundant backup. This backup will happen vie LAN to the HTPC and to an external backup. Again, disks are usually available (unless lost or digital from iTunes, Amazon, etc) but…..not if I can help it.


Backup:
All music is on each personal computer and in an identical folder on the HTPC, plus in an identical folder on external hdd (eSata or USB to HTPC, connected only when backup happening – maybe weekly). New music can be manually added to personal library, then copied by LAN to HTPC. Other options may be possible, such as robocopy on HTPC running in monitor mode or as a scheduled task, but will be discussed later.

Each personal computer will have both “archive-able†and “active†documents. The first copy resides on each respective computer. I think it would be possible to use logical folder hierarchies to make this relatively manageable and to separate each data type. For example, laptop has a data partition on hard drive. This partition contains a variety of folders. The folders include “wifemusicâ€, “wifearchiveâ€, “wifeactiveâ€. “wifebusinessâ€. New additions to music and archive (and also business) are handled by manual or scheduled robocopy commands that copy new files, do not purge, and do not update existing files on backup.

I think robocopy d:wifemusic q:wifemusic /e /copyall /xc /log+:name.txt What is important here is that I am copying only files that are new in source but no changing any destination files to mirror any changes in source. I am not sure if I have the command right however as I don’t fully understand how exclusions based on parameter work. ----- HELP.


Questions
Now I run into a variety of questions. I’ll try not to ramble. In short, my biggest question involve robocopy syntax for exactly what files are copied, and how to deal with active data redundancy where some mirroring is necessary.

1. How do I handle the fact that different computers will likely have disks/partitions with similar names across the network? My wife’s laptop will have C: and D:, though C is never touched (no need to backup OS in my opinion). My computer can have about 10 drive letters as I have some virtual optical drives and a card-reader that has like 5 letters of its own. The HTPC will have 3-4 of its own plus 1-2 external drives. Should I carefully assign drive letters to each computer so as to avoid this issue or is this a place to map network drives? To be honest, I don’t really understand what “mapping network drives’ is for.

2. Archive robocopy syntax. What is correct for copying files in source not already in destination? What is correct for not copying changes to files in source that are already in destination? /XC, /XN., /what?

3. The hard part: how to best deal with “Active†files that need to be mirrored while minimizing the chance of accidental deletion or data corruption? This is especially difficult, to my thinking, when considering my large image collection. Ignoring the large image collection issue for the moment, how do I deal with the other Active data? It does not seem possible to protect against data corruption or deletion without either starting with a separate raw data archive and working on copies (later removing archive if no issues in active data found) or using some form of checksum protection. This requires RAID or file systems not supported by Windows (if I understand correctly). THOUGHTS??????

Is FlexRaid the solution? I’ve no idea where to begin with flexRAID and rebuilding sounds scarey. Should I run **nux in a virtual machine for ZFS or………..?

4. The HTPC will not always be turned on. I’ve never dealt with remote access before. Is it possible to wake/sleep or turn on a PC remotely? This question only comes to me at this moment so I have not researched it yet. Will going into sleep mode or hibernation cause issues with backup? I’m sure I can arrange a schedule for backup, etc when all computers are on. Still, I’m curious.


The HTPC will have space for 2 sata drives for backup. I have a 2Tb (Samsung F3) and 2x1Tb (WD) green drives available for the HTPC. If you add all out current data together, we have ~200Gb data (with music) and an additional 200Gb images. If my image collection doubles in 2 years, then 1Tb should hold us for the next 2-3years. This is acceptable for me. This leaves an additional 1-2TB for video storage. Perhaps video on the 2TB, data on a 1 TB, and hotswap with eSata as a third backup of all our data (with separate 2Tb and 1Tb drives). This gives 3x redundancy to all data. Safe enough for me at this time.
Edited by ddietz - 12/27/10 at 8:21pm
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
post #3 of 16
Thread Starter 
The TL&DR Summary

Two personal computers and an HTPC are connected by wired LAN.
I’d like the HTPC to also serve as a backup center for the other two computers.
Our data includes archives which may have new files added but no changes to the file data and no deletions (these can be handled manually as necessary).
Data also includes “active†data (current documents and lots of images) that require some form mirroring.
This presents a place for accidental deletion or silent corruption to ruin all active data.
I need Windows and don’t really want RAID. I’d like to use syntax command like robocopy to handle this.
Biggest question: How do I deal with mirroring data while minimizing chance of data corruption?
Seocnd: How do I use robocopy to copy new files in source to destination but not update changes or deletions of files in source to destination? In short, copy only novel files.
Last: Should I consider a virtual machine on the HTPC to handle this issue through flexraid, linux mdadm, something with ZFS (all new to me)?


Thanks!!!!!
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
post #4 of 16
Since we all think different, we all think different about storage....
And to sum it up with a good quote i heard once:
"Backup is something you never think about untill you need it"...

My fileserver setup consists of 8 drives.

1. 2x250gb in RAID1 for the operating system (easy recovery if drive fails)
2. 1x500gb for active stuff like torrents & FTP
3. 3x750gb in RAID5 as network drive whitch contains mediafiles, backups, pictures, music, movies, series & software I actively use.
4. 2x250gb in RAID1 for backing up the important stuff on the RAID5 drives I would have no way/difficulty of getting back. (music & pictures)

This IMO is good security for me...but like I said, personal taste is key here..
RAID5 is a good tradeoff between performance/security, but for optimal security, I don't think it's a good enogh standalone solution.
that's why I mirror my can-not-loose-files to the 2x250gb drives in RAID1.
RAID1 is also very easy to recover after failure, but the tradeoff is less space.

I'm currently thinking of making a small RAID0 array for storing the virtual machine drives, for added speed...(and just backing it up)
    
CPUMotherboardGraphicsRAM
2x Intel Xeon e5520 2.26GHz HT Intel s5520scr 1xGeforce 250gts 1xGeforce 450gts 12GB 
Hard DriveOptical DriveOSMonitor
2x250GB RAID0 OS DVD/CD Windows 7 Professional x64 2x BenQ m2700HD 27" 
KeyboardPowerCaseMouse
yes. Black 1000W I'll check later.... yes. Black 
Mouse Pad
yes. Black 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
2x Intel Xeon e5520 2.26GHz HT Intel s5520scr 1xGeforce 250gts 1xGeforce 450gts 12GB 
Hard DriveOptical DriveOSMonitor
2x250GB RAID0 OS DVD/CD Windows 7 Professional x64 2x BenQ m2700HD 27" 
KeyboardPowerCaseMouse
yes. Black 1000W I'll check later.... yes. Black 
Mouse Pad
yes. Black 
  hide details  
Reply
post #5 of 16
Quote:
Originally Posted by ddietz View Post
The TL&DR Summary

Two personal computers and an HTPC are connected by wired LAN.
I’d like the HTPC to also serve as a backup center for the other two computers.
Our data includes archives which may have new files added but no changes to the file data and no deletions (these can be handled manually as necessary).
Data also includes “active†data (current documents and lots of images) that require some form mirroring.
This presents a place for accidental deletion or silent corruption to ruin all active data.
I need Windows and don’t really want RAID. I’d like to use syntax command like robocopy to handle this.
Biggest question: How do I deal with mirroring data while minimizing chance of data corruption?
Seocnd: How do I use robocopy to copy new files in source to destination but not update changes or deletions of files in source to destination? In short, copy only novel files.
Last: Should I consider a virtual machine on the HTPC to handle this issue through flexraid, linux mdadm, something with ZFS (all new to me)?


Thanks!!!!!
Personally, I wouldn't consider a virtual computer for doing RAID for security/duplication.
one reason for this is you actually have to maintain "another" computer,
which in fact is using the same drives you would be using anyway.
second reason is that if you have files on a virtual disk on a virtual pc, you have difficulty accessing this data...
    
CPUMotherboardGraphicsRAM
2x Intel Xeon e5520 2.26GHz HT Intel s5520scr 1xGeforce 250gts 1xGeforce 450gts 12GB 
Hard DriveOptical DriveOSMonitor
2x250GB RAID0 OS DVD/CD Windows 7 Professional x64 2x BenQ m2700HD 27" 
KeyboardPowerCaseMouse
yes. Black 1000W I'll check later.... yes. Black 
Mouse Pad
yes. Black 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
2x Intel Xeon e5520 2.26GHz HT Intel s5520scr 1xGeforce 250gts 1xGeforce 450gts 12GB 
Hard DriveOptical DriveOSMonitor
2x250GB RAID0 OS DVD/CD Windows 7 Professional x64 2x BenQ m2700HD 27" 
KeyboardPowerCaseMouse
yes. Black 1000W I'll check later.... yes. Black 
Mouse Pad
yes. Black 
  hide details  
Reply
post #6 of 16
Thread Starter 
Thanks for your input, sharing your situation, and for the final suggestion.

Quote:
Personally, I wouldn't consider a virtual computer for doing RAID for security/duplication.
one reason for this is you actually have to maintain "another" computer,
which in fact is using the same drives you would be using anyway.
second reason is that if you have files on a virtual disk on a virtual pc, you have difficulty accessing this data...
I mentioned virtual because after reading around here a bit it seemed like many large servers use virtualization. It also seemed like a way to use a different file system (ZFS) or better software raid (linux) while still being able to use the Windows front. I would not use virtual drives.


As I said, backing up with redundancy my active data, especially all the images, while protecting against corruption is the hard part for me.
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
post #7 of 16
Quote:
Originally Posted by ddietz View Post
Thanks for your input, sharing your situation, and for the final suggestion.



I mentioned virtual because after reading around here a bit it seemed like many large servers use virtualization. It also seemed like a way to use a different file system (ZFS) or better software raid (linux) while still being able to use the Windows front. I would not use virtual drives.


As I said, backing up with redundancy my active data, especially all the images, while protecting against corruption is the hard part for me.
I don't know about the large virtualization programs, but I guessed that to install a virtual OS, you have to use a virtual harddrive(a file on the hosts harddrive), Thus your files are "encrypted" inside this virtual drive file...
I just dont know if esxi and such support using harddrives directly....if so, I see no problem in trying VM's for different RAID support...
Personally I only use VirtualBox & 1 machine with XP on it...
This is for testing out different stuff...

But there is no way around your problems... To protect against corruption and keeping fault toleranse/redundancy....you need more drives...

An alternative could be to have a separate array RAID5 or RAID1, and do monthly(or so) backups to this (eather local array or another NAS.
Then you have some corruption protection without loosing redundancy.
Did this make sense to you? I just had a couple of beers, so I'm not perfectly coherent....
    
CPUMotherboardGraphicsRAM
2x Intel Xeon e5520 2.26GHz HT Intel s5520scr 1xGeforce 250gts 1xGeforce 450gts 12GB 
Hard DriveOptical DriveOSMonitor
2x250GB RAID0 OS DVD/CD Windows 7 Professional x64 2x BenQ m2700HD 27" 
KeyboardPowerCaseMouse
yes. Black 1000W I'll check later.... yes. Black 
Mouse Pad
yes. Black 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
2x Intel Xeon e5520 2.26GHz HT Intel s5520scr 1xGeforce 250gts 1xGeforce 450gts 12GB 
Hard DriveOptical DriveOSMonitor
2x250GB RAID0 OS DVD/CD Windows 7 Professional x64 2x BenQ m2700HD 27" 
KeyboardPowerCaseMouse
yes. Black 1000W I'll check later.... yes. Black 
Mouse Pad
yes. Black 
  hide details  
Reply
post #8 of 16
Thread Starter 
Quote:
I don't know about the large virtualization programs, but I guessed that to install a virtual OS, you have to use a virtual harddrive(a file on the hosts harddrive), Thus your files are "encrypted" inside this virtual drive file...
I just dont know if esxi and such support using harddrives directly....if so, I see no problem in trying VM's for different RAID support...
Personally I only use VirtualBox & 1 machine with XP on it...
I'm not very experienced with virtual machines but I have used Virtual Box a good bit for some testing, not much though. You can do network sharing so I assumed one could run a light OS (which is in a virtual drive) with raid, nas, etc, and I assume done could control physical disks with this since it is possible to connect to current physical disks for data sharing. I don't know though.....

Regarding RAID:
I still don't really understand how much raid protects your data. Raid 1 mirrors w/o striping, which I see as essentially the same as syntax mirroring. You must mirror whole disks or partitions in raid but it is real time and constant (more or less). On the other hand, raid controllers could cause problems with later upgrade or rebuild, you can't simply focus on simple directories, and you can't change/implement after you've added data.

As I understand it, RAID 5 offer parity (parity- I don't fully understand even after reading about it), which offers protection against corruption in terms of physical disk issues, but not against accidental deletion, virus, program errors, etc. It has striping which allows one to "protect" more data than a 1:1 mirror but one seems heavily dependent on the hardware or software controller in terms of later rebuilds or desired changes to the system.

FlexRAID sounds nice as it allows unmatched disks and is more flexible in terms of parity.
At the same time, I believe the learning curve for me is rather high and I'm scared of being unable to handle a rebuild in the future. It seems like validation, which uses checksums to protect against corruption, must be physically run or scheduled, and can lead to down time.


All this is why I keep leaning toward using robocopy or xcopy, etc.

In reality, my household has very lite of what I call "Active" data, meaning data that is actively and currently edited. We mostly add new data. Our active data is limited to a few spread sheets, word documents, email and browser personal data, and my images.

Aside from images, this active data is likley easily handled by thoughtful copies of know working files to multiple locations. Bad experiences over the years have already trained us to occasionally email ourselves a copy of a word document, or to occasionally save as with a new name (say fist is "roughdraft", next is "-edited", etc). These earlier versions are suddenly archives. The trick seems then how to efficiently manage the change from "active" to "archive". I suspect this is easy with logical project management. For example, if I where starting a new project (I'm a biologist so that would include research, data -excel, and word documents) I'd create an appropriately named folder (JumpingForgs for example) in my "active" folder on my internal data drive. This folder would be backed up to other physical drives by copying new and updating changed files. I could purge or not.

I suspect I am rambling. Let me try to explain how I'd setup my file structure.

C: is OS, my data, the first copy, is on a seperate 1Tb drive called D:, which contains subfolders:
ddietzarchive
ddietzactive
ddietzimages

I have no other personal data categories.
archive is backed up by robocopy source destination /e /copyall /xc /log+:logname.txt
if I have that syntax right it will copy all attributes and new data but will not update any changed data on source or delete any data not on the source. This would be run for each back up location. I could from time to time run a "test" to check for any changes in files between source and destination. As these files should never be edited, any descripency might suggest corruption and could be examined in more detail.

Music and movies would also be maintained in a similar fashion.

Active data is harder. Since the amount of files are generally small for me, I could either mirror and accept some risk - possibly doing the occasional save as with new new name, for additional safety. As these are generally text or excel files, we are unlikley to get overloaded on orphaned files if we maintain a project folder naming system. I tend to work under project folders anyway.

An alternative might be to treat changed files as new files, re-named as -1, -2, etc to the destination folder. I suspect this is possible with some scripting but would need help with that. At the end of a project, or from time to time, I could manually purge or sync folders to clean up the extra stuff.

Images are the hardest, but I use Lightroom, which may help. For those that do not know, Lightroom is a photo organizer, catalog system, and non-destructive editor. It looks at the image but does not edit the image. Instead, small metadata files are created for every image, The code within tells LR how to display the photo. At the end, you can export a copy of the original image with your edits. The original remains untouched.

The drawback to this is that the metadata files are constantly edited. Its easy enough to create two or more copies of the original images to protect against loss. As these are never edited mirroring the image files themselves is un-necesary. It might be possible to mirror these files by limiting a mir command to include only *.xmp, the little instruction files next to each image and having the same filename as the image. I know .xmp exists in other places but within my image folders, these are entirely for image data.

Describing all this makes me wonder if I can use it to "cheat". If I have some commands set to run when LR runs, and if they are scripted to mirror changes that occur after the time when the program opens, i am protected from any possible silent corruption at other times but still get to mirror my current edits.

Again, I'd need some help writing that, assuming it could be done. LR itself also has a catalog backup to backup all those files, which I might also be able to implement, making all this a mute point.


Thanks again to anyone who reads my rambling!
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
post #9 of 16
OP:

I'm not sure I can answer all of your questions, but I can give you a few pointers. You can take what I say and apply it to your own situation. Here goes...

If it was up to me, I'd build a small low powered machine specifically to handle backups and data storage. I would install Windows Home Server or Amahi Home Server, and have that OS handle my data, using its data replication features.

The server would hold all of the archived data, media files and active data backups. The replication features allow a user to eliminate the use of RAID, although I would configure a RAID 1 array for the OS, to minimise downtime in the event of OS hard drive failure. I would also use an UPS.

Either OS can create and export Windows shares. For off site backups, I would look for a desktop backup tool which supports Amazon S3, Carbonite or DropBox and use that to create scheduled off site backups, as well as backups to the server.
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Apple Basic Keyboard Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
Ryzen
(12 items)
 
  
CPUMotherboardGraphicsRAM
Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Palit GT-430 Corsair Vengeance LPX CMK16GX4M2B3000C15 
Hard DriveCoolingOSMonitor
Samsung 850 EVO AMD Wraith Spire Linux Mint 18.x Dell UltraSharp U2414H 
KeyboardPowerCaseMouse
Apple Basic Keyboard Thermaltake ToughPower 850W Lian-Li PC-A04B Logitech Trackman Wheel 
  hide details  
Reply
post #10 of 16
Thread Starter 
Parityboy,

Thanks for the advice. Those off-site options are not an option for me but that is OK! I appreciate the info.

How does WHS differ from Windows Server 08 R2? I have access to Server08R2 for free but not WHS. I'll certainly do some research on my own but would Server08 work the same as far as software raid setups. These include parity? As I've said, I feel redundancy is easy for me but my concerns lie in file corruption at the bit level.

How does replication on WHS (and maybe server08) compare with RAID as far as protection, re-building after a failure, and stripping (the real reason I am scared of raid).

For my purposes, combining the "server" and HTPC is my best option as it validates each separate task and justifies it to me. It also better makes use of my money, space, and available disks

Seems like there should be a way to use checksums in combination with file modification datestamps to protect against almost all data corruption issues. If a file has a changed checksum but has not been used by a program, problem.


Thanks again!
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
Alpha v.2
(13 items)
 
  
CPUMotherboardGraphicsRAM
Phenom II 740be @ 3.63 (16.5 x 220) ASUS M3A78-CM XFX HD4850 Crucial 2x2GM DDR2-800 6-6-6-18 
Hard DriveOptical DriveOSMonitor
WD Black 500GB OS/Programs, WD 1Tb + 800 GB data Sony DVD-RW Win 7x64 Professional Dell 22" 
PowerCase
Antec Earthwatts 650 Antec 300 w/5 fans 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Servers
Overclock.net › Forums › Specialty Builds › Servers › DATA: A discussion of redundancy with questions