If you are reading this, you're running Windows 8 and have an SSD with no power reserve (meaning it goes down when the computer shuts down - I know such exist - capacitors, backup power) you can help me with a little experiment so that I can verify a theory of mine.
I am troubleshooting an issue with my OCZ Vertex 3 drive which has been driving me nuts. Ever since switching to Windows 8 I started seeing
The IO operation at logical block address for Disk X was retried.
Warnings in my event log. I did a lot of research, tried all the workarounds I could find (switching to Microsoft AHCI, an arcane bctick or something command, hot swapping on/off, link state pci express power saving, etc.) but nothing helped. Eventually I also started seeing:
The device, [changes[, did not respond within the timeout period.
Which would ultimately result in complete OS failure (it couldn't read the drive and would hang completely). Now, 1 RMA later the problem is back but I have a theory why it happens. I believe that this is due to power loss events during read/write I/O operations and yes, I've had them. This is also substantiated by the number of non-ECC correctable SMART errors around the time of the power loss events. If I init drive (OCZ=Secure Erase) these go away, but otherwise they stay for good.
Now, it's common knowledge that power loss for an SSD is very, very bad - there's published research on how it could even brick an SSD and lead to serious data loss:
https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf
In the old days of HDDs and FAT the worst what could happen would be cross-linked files, lost clusters, etc. However in the case of SSDs I theorize it could actually cause the NAND's state to become unstable permanently (well, until they are reinitialized), hence the warnings/errors/behavior I am observing.
So my request is: if you have had random power loss events (which you can see in the SMART data) since your last SSD erasure, could you please also check your Windows 8's Event Log for EventID: 9. Level: Error The device, [changes], did not respond within the timeout period or EventID: 153 The IO operation at logical block address [changes] for Disk X was retried events and let me know?
Many thanks.
I am troubleshooting an issue with my OCZ Vertex 3 drive which has been driving me nuts. Ever since switching to Windows 8 I started seeing
The IO operation at logical block address for Disk X was retried.
Warnings in my event log. I did a lot of research, tried all the workarounds I could find (switching to Microsoft AHCI, an arcane bctick or something command, hot swapping on/off, link state pci express power saving, etc.) but nothing helped. Eventually I also started seeing:
The device, [changes[, did not respond within the timeout period.
Which would ultimately result in complete OS failure (it couldn't read the drive and would hang completely). Now, 1 RMA later the problem is back but I have a theory why it happens. I believe that this is due to power loss events during read/write I/O operations and yes, I've had them. This is also substantiated by the number of non-ECC correctable SMART errors around the time of the power loss events. If I init drive (OCZ=Secure Erase) these go away, but otherwise they stay for good.
Now, it's common knowledge that power loss for an SSD is very, very bad - there's published research on how it could even brick an SSD and lead to serious data loss:
https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf
In the old days of HDDs and FAT the worst what could happen would be cross-linked files, lost clusters, etc. However in the case of SSDs I theorize it could actually cause the NAND's state to become unstable permanently (well, until they are reinitialized), hence the warnings/errors/behavior I am observing.
So my request is: if you have had random power loss events (which you can see in the SMART data) since your last SSD erasure, could you please also check your Windows 8's Event Log for EventID: 9. Level: Error The device, [changes], did not respond within the timeout period or EventID: 153 The IO operation at logical block address [changes] for Disk X was retried events and let me know?
Many thanks.