Any short test will only tell you when an overclock attempt gets to a point where it is grossly unstable. That's a very useful thing to know of course. When I'm feeling out a new processor, I will ratchet up the multiplier until it starts to crash or produce errors during quick tests such as 10 minutes of P95. Once this starts to happen, I raise the voltage one or two steps (25-50mV) and try that same multiplier again. If it passes, I keep increasing the multiplier as before. I repeat that process until I reach the end of what I'm willing to do for voltage, and I also plot the results of each test. If raising voltage yields a lower frequency ceiling than I had at the lower voltage due to the extra heat, I would consider that previous voltage to be my maximum target even if I had a higher voltage in mind at the start of the process. As an aside, that record of these tests becomes important in the future when you want to re-examine your chip and you've long forgotten the chip's response to these tests. It also becomes handy as a baseline when you want to switch to a different motherboard. Your results will be different in another board, so the comparison is useful.
Anyhow, once I've reached a point where higher voltage is either unavailable or unhelpful and the chip is producing errors or crashing, it's time to back down one multiplier at a time and do extended tests. Instead of 10 minutes of P95, think something more along the lines of a successful 24-48 hours of blend. Once you've reached settings that pass that test, it's time to let the thing idle and do regular work for you. If you like to do distributed computing like WCG or FAH, it's not yet time to start that. You need to run the system for a few days to a week where load is variable in order to know if it's really going to be stable. If you crash, drop your multiplier and do the idle/regular load period test again. You don't need to do another P95 run since you already know it's capable of passing it at the higher frequency. The reason for the idle test is that when you do something like play a game or just tool around in Windows, you'll often have less voltage available during brief periods of full load on individual cores (but not all) than you would have when your chip was at 100% load. Sometimes this results in a crash you wouldn't otherwise see when doing full load testing.
That will get you to a point where I feel you can call it stable, but most people aren't nearly so patient. You'd be surprised how many people do no testing at all or 30 minutes of P95 and then tell everyone their system is "stable". It's misleading and usually results in the community-sourced average overclock for any particular CPU (Intel or AMD alike) to look 200-400 MHz higher than what an average sample will truly yield. The good news though is that these days a 200 MHz difference barely makes any difference.
As a final note: If you like to use Prime95, make sure you're using the 27.x branch and enable both kinds of error checking. By default, round off checking and sum checking are disabled. Both can be toggled from the Advanced menu across the top of the application before you begin your test for the first time. The settings are nonvolatile, so they do not need to be set each time.