Warning: this is not a simple problem. There is a lot of information posted here and I ask that you please do not offer advice or speculate as to what the problem might be unless you have read all of my posts.
I have been troubleshooting for the past week and what I am seeing continues to make less and less sense. I normally do not post threads asking for help; I am doing this as a last resort. My hope is that posting this thread will help me organize all my observations in one place, and that perhaps one of you has seen something like this in the past.
First, some history:
The machine in question is BlackBox, my sig rig. It is running a Q6600 in a Gigabyte X38-DS4 motherboard with 2x2GB DDR2 1066mhz Corsair Dominator RAM. I built this computer back in February '08, at which point it had 2x1GB Corsair DDR2 800mhz Corsair XMS2 RAM. I overclocked my Q6600 to 3.6ghz(9x400), and my overclock was 48 hours Prime-95 stable. Everything was great until the spring of '09 when I started Project BlackBox.
Project BlackBox:
During this project I moved my sig rig from an NZXT Nemesis Elite case to my scratch built acrylic case. I also made some hardware upgrades: I swapped in 4GB of Dominator RAM and put the old XMS2 RAM in another computer, added a second ATI 3870, lapped my Q6600, swapped a Creative X-Fi Titanium in for an old Audigy SE, and swapped in an Asus SATA DVD drive for an old IDE Sony drive. I also replaced upgraded my water loop, added more fans, and replaced all the lighting.
After finishing project BlackBox I set everything back to BIOS defaults except I set the RAM voltage to 2.1V for the Dominators. Everything was fine for a few weeks. When I finally had time, I overclocked my machine back to 3.6hz(verified stable by another 48 hours of Prime95). I used BlackBox like this for about a week, during which I rebuilt my entire Gentoo installation(after 6 months just about every piece of software was out of date so I ended up recompiling everything) and ran a F@H SMP client 24/7 under Gentoo.
Issues Become Present:
After a week of folding under Gentoo, I booted into Vista to play some games. Everything was fine until I had ended my gaming session-after closing the game I had been playing, I tried to open a web browser and E-mail client. Vista Blue-screened. I decided I would investigate later and booted Gentoo. That night and the next day I started getting segfaults and my FAH client crashed- two things that have never happened to me after folding under Gentoo 24/7 for nearly a full year. At this point I knew something was very wrong.
OK, enough story time. Here are the things I have tried and my observations in chronological order as best I can remember:
Booted Vista, ran Prime95 -> in-place fft runs fine, but blend failed instantly
Dropped multiplier to 6(6*400=2.4ghz), tried Prime95 again -> blend failed after 10-15 minutes
from here on out I run custom test in Prime95 which is the same as blend but uses more RAM
ran memtest86 -> lots of errors detected
removed one ram stick, tried memtest86 again -> no errors after two passes(~1 hour)
swapped in second stick, tried memtest86 again -> no errors after two passes(~1 hour)
put both sticks in, tried memtest86 again -> no errors after about 4 hours
booted vista, tried Prime95 again -> failed after an hour
moved RAM sticks from the yellow slots to the red slots, ran Prime95 overnight -> no errors after 10 hours
I then booted into Gentoo and after a few hours I started seeing segfaults again.
ran memtest86 -> errors detected within minutes
booted Vista -> Prime 95 detects errors instantly
After this I flashed my BIOS to the latest version, returned to BIOS defaults, raised my RAM voltage to 2.1V. I tried testing with memtest86 and it started picking up errors after about an hour. I then set all my RAM timings manually according to the EPP profile (still running at 1066mhz). No matter how I played with the settings I could not get memtest86 to run for more than an hour without generating errors.
Next I changed the RAM divider and ran the Dominators underclocked at 800mhz. I ran memtest and no errors were picked up after ~4 hours. Then I booted Vista, ran Prime95 blend for 20 hours and no errors were detected.
After I ended the blend test I booted Gentoo. A few hours later I started getting more segfaults. I ran memtest and it found errors within seconds.
I believe that covers just about everything I tried over the past week. Now for some general observations:
I have experienced crashing programs, blue screens, and lots of errors in the Vista logs- my point is that I have experienced weird behavior in Windows too, not just Linux. During the long Prime95 tests everything is fine though; I have only experienced strange behavior in Windows either just before or just after failing Prime95 and/or memtest86.
I have gotten these memory errors at 2.4ghz and 3.6ghz, at 400mhz fsb and 266mhz fsb, at 1066 and 800mhz RAM (at all 4 combinations with 400mhz and 266 mhz fsb), and with my lights on as well as off.
Something that I noticed over the past couple days is that everything seems to be fine when I first turn the computer on, and the errors only begin to occur after 24-48 hours. Once I start getting memory errors, I continue to get memory errors until I shut down the computer. Rebooting has no effect yet powering the entire machine down, even for just a second, seems to make all the errors go away for ~ 24 hours.
The next thing I am going to try is more extensive testing with individual RAM sticks.
I have been troubleshooting for the past week and what I am seeing continues to make less and less sense. I normally do not post threads asking for help; I am doing this as a last resort. My hope is that posting this thread will help me organize all my observations in one place, and that perhaps one of you has seen something like this in the past.
First, some history:
The machine in question is BlackBox, my sig rig. It is running a Q6600 in a Gigabyte X38-DS4 motherboard with 2x2GB DDR2 1066mhz Corsair Dominator RAM. I built this computer back in February '08, at which point it had 2x1GB Corsair DDR2 800mhz Corsair XMS2 RAM. I overclocked my Q6600 to 3.6ghz(9x400), and my overclock was 48 hours Prime-95 stable. Everything was great until the spring of '09 when I started Project BlackBox.
Project BlackBox:
During this project I moved my sig rig from an NZXT Nemesis Elite case to my scratch built acrylic case. I also made some hardware upgrades: I swapped in 4GB of Dominator RAM and put the old XMS2 RAM in another computer, added a second ATI 3870, lapped my Q6600, swapped a Creative X-Fi Titanium in for an old Audigy SE, and swapped in an Asus SATA DVD drive for an old IDE Sony drive. I also replaced upgraded my water loop, added more fans, and replaced all the lighting.
After finishing project BlackBox I set everything back to BIOS defaults except I set the RAM voltage to 2.1V for the Dominators. Everything was fine for a few weeks. When I finally had time, I overclocked my machine back to 3.6hz(verified stable by another 48 hours of Prime95). I used BlackBox like this for about a week, during which I rebuilt my entire Gentoo installation(after 6 months just about every piece of software was out of date so I ended up recompiling everything) and ran a F@H SMP client 24/7 under Gentoo.
Issues Become Present:
After a week of folding under Gentoo, I booted into Vista to play some games. Everything was fine until I had ended my gaming session-after closing the game I had been playing, I tried to open a web browser and E-mail client. Vista Blue-screened. I decided I would investigate later and booted Gentoo. That night and the next day I started getting segfaults and my FAH client crashed- two things that have never happened to me after folding under Gentoo 24/7 for nearly a full year. At this point I knew something was very wrong.
OK, enough story time. Here are the things I have tried and my observations in chronological order as best I can remember:
Booted Vista, ran Prime95 -> in-place fft runs fine, but blend failed instantly
Dropped multiplier to 6(6*400=2.4ghz), tried Prime95 again -> blend failed after 10-15 minutes
from here on out I run custom test in Prime95 which is the same as blend but uses more RAM
ran memtest86 -> lots of errors detected
removed one ram stick, tried memtest86 again -> no errors after two passes(~1 hour)
swapped in second stick, tried memtest86 again -> no errors after two passes(~1 hour)
put both sticks in, tried memtest86 again -> no errors after about 4 hours
booted vista, tried Prime95 again -> failed after an hour
moved RAM sticks from the yellow slots to the red slots, ran Prime95 overnight -> no errors after 10 hours
I then booted into Gentoo and after a few hours I started seeing segfaults again.

ran memtest86 -> errors detected within minutes
booted Vista -> Prime 95 detects errors instantly
After this I flashed my BIOS to the latest version, returned to BIOS defaults, raised my RAM voltage to 2.1V. I tried testing with memtest86 and it started picking up errors after about an hour. I then set all my RAM timings manually according to the EPP profile (still running at 1066mhz). No matter how I played with the settings I could not get memtest86 to run for more than an hour without generating errors.
Next I changed the RAM divider and ran the Dominators underclocked at 800mhz. I ran memtest and no errors were picked up after ~4 hours. Then I booted Vista, ran Prime95 blend for 20 hours and no errors were detected.
After I ended the blend test I booted Gentoo. A few hours later I started getting more segfaults. I ran memtest and it found errors within seconds.
I believe that covers just about everything I tried over the past week. Now for some general observations:
I have experienced crashing programs, blue screens, and lots of errors in the Vista logs- my point is that I have experienced weird behavior in Windows too, not just Linux. During the long Prime95 tests everything is fine though; I have only experienced strange behavior in Windows either just before or just after failing Prime95 and/or memtest86.
I have gotten these memory errors at 2.4ghz and 3.6ghz, at 400mhz fsb and 266mhz fsb, at 1066 and 800mhz RAM (at all 4 combinations with 400mhz and 266 mhz fsb), and with my lights on as well as off.
Something that I noticed over the past couple days is that everything seems to be fine when I first turn the computer on, and the errors only begin to occur after 24-48 hours. Once I start getting memory errors, I continue to get memory errors until I shut down the computer. Rebooting has no effect yet powering the entire machine down, even for just a second, seems to make all the errors go away for ~ 24 hours.
The next thing I am going to try is more extensive testing with individual RAM sticks.