Overclock.net banner

Need Windows 8/SSD volunteers for a small experiment ;-)

1K views 6 replies 4 participants last post by  Br3ach 
#1 ·
If you are reading this, you're running Windows 8 and have an SSD with no power reserve (meaning it goes down when the computer shuts down - I know such exist - capacitors, backup power) you can help me with a little experiment so that I can verify a theory of mine.

I am troubleshooting an issue with my OCZ Vertex 3 drive which has been driving me nuts. Ever since switching to Windows 8 I started seeing

The IO operation at logical block address for Disk X was retried.

Warnings in my event log. I did a lot of research, tried all the workarounds I could find (switching to Microsoft AHCI, an arcane bctick or something command, hot swapping on/off, link state pci express power saving, etc.) but nothing helped. Eventually I also started seeing:

The device, [changes[, did not respond within the timeout period.

Which would ultimately result in complete OS failure (it couldn't read the drive and would hang completely). Now, 1 RMA later the problem is back but I have a theory why it happens. I believe that this is due to power loss events during read/write I/O operations and yes, I've had them. This is also substantiated by the number of non-ECC correctable SMART errors around the time of the power loss events. If I init drive (OCZ=Secure Erase) these go away, but otherwise they stay for good.

Now, it's common knowledge that power loss for an SSD is very, very bad - there's published research on how it could even brick an SSD and lead to serious data loss:
https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf
In the old days of HDDs and FAT the worst what could happen would be cross-linked files, lost clusters, etc. However in the case of SSDs I theorize it could actually cause the NAND's state to become unstable permanently (well, until they are reinitialized), hence the warnings/errors/behavior I am observing.

So my request is: if you have had random power loss events (which you can see in the SMART data) since your last SSD erasure, could you please also check your Windows 8's Event Log for EventID: 9. Level: Error The device, [changes], did not respond within the timeout period or EventID: 153 The IO operation at logical block address [changes] for Disk X was retried events and let me know?

Many thanks.
 
See less See more
#2 ·
U need Win7 too ?
tongue.gif
to compare vs Win8
 
#6 ·
When you said you can see random power loss events in the SMART data, are you referring to the Unsafe Shutdown Count attribute?

If so that attribute is not available in all SSDs. Not in the Samsung 840 and 830, or the OCZ Vertex 4. That attribute is saved in Intel 520 SSDs, but seems to be buggy, since it tally's entering Windows Sleep as an unsafe shutdown. I live in a suburb of a major US city, and power outages are rare. I use Windows 8 on PCs with SSDs.

It sounds like you have been trying some of the voodoo cures for the Vertex 3's ills (disabling hot plugging), sorry to say. I hate to mention the F word here (Firmware), but I imagine you've been there, done that.

A comment about power loss in SSDs, Samsung and Intel's secure erase feature in their SSD utilities require the user to power cycle the SSD before the SE will be performed to clear a security freeze lock. That is done by literally removing the SATA power cable from the back of the SSD for a few seconds, and then reconnecting it while the PC is on in Windows. If any SSD could be bricked by a power loss, this procedure clearly defies that statement. I've done this myself with both manufactures SSDs, multiple times, with zero ill affects.

It is also common when changing settings to OC the CPU and memory in the BIOS, for the PC to shut off for ~ five seconds after saving and exiting the BIOS, without booting to Windows. Two mother boards I use have a CMOS/BIOS clear button on the I/O panel that can be pressed anytime you wish to do so. When that button is pressed, the PC shuts off immediately, and starts only when the button is released. Those PCs have multiple SSDs in them, including the OS drives (one in RAID 0) and I've never had any problems with the SSDs or the OS at all.

The following is not meant to trivialize your SSD issues, but Windows event log Warnings can have little to zero value. I have Warnings that my boot time has not degraded, for example. Your Warnings are classified as Errors, rather than informational, correct?

May I ask what system/PC you are using with your Vertex 3?
 
#7 ·
Hi,

Yes, I am referring to SMART Event ID 174 - Unexpected Power Loss Count. At least on my Vertex 3 drive this counter increases only on hard power off and (I'm pretty sure) reset button operations - not on S3/S5 sleep.

Regarding hot swapping - I don't think this is related to this particular case, yes. I used to play with it while the whole community was troubleshooting the firmware issues plaguing the SF firmware, but this has been addressed a while ago (don't remind me ;-) and I have zero BSOD/stability issues with my Vertex 3 drives since then.

Regarding secure erase operations - well, I don't know what's the procedure within Windows, but normally the procedure I follow is done via the Linux bootable tools of OCZ is secure erase->(clean) shut down.

The study I linked to makes for some very good reading and they have investigate how interrupted write I/O due to a power loss makes NAND blocks more susceptible to read disturbs and other unfortunate issues. As my drive, amongst others, does not have a supercapacitor to allow the controller to flush pending operations this makes it more susceptible than others. Note, that unclean power off shouldn't be an issue when the SSD is NOT performing I/O - i.e. when you're in the BIOS - outside of any OS.

Regarding the retried I/O warnings - indeed these are just warnings about retried read operations so I ignored them at first. However on my first drive what I have observed was that: they would keep growing in number and eventually I would get complete read timeouts - controller reset events which would hang Windows 8 completely. After a secure erase -> restore from backup things would work perfectly well again (with no I/O warnings at least for a month or so). I RMA'ed the drive but I have started seeing retried I/Os again in about a month or so. Hence my theory. When I speak about degradation I refer to unstable NAND power states.

My configuration is:

i3770k 4.6 Ghz at 1.3V (100% stable)
ASUS Sabertooth Z77 (BIOS 1805, latest)
8GB DDR3 CL7
OCZ Vertex 3 120GB (FW 2.25 (latest)
Bunch of HDDs in RAID
GTX 295
PSU Corsair AX1200

Cheers
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top