I'm Brazilian and I'm moving back to Brazil next week. Right now I'm living in Buenos Aires - Argentina, where I have just spent a year studying at the University of Buenos Aires. Basically here I studied low level programming and computer architecture.
One of the helpers in the class of Computer Architecture 2 is currently working on his PHD on computer science and to be precise he is working on something related to what you said.
Once he car broke and while it was in the mechanic we happened to take the bus together a few times so we talked a little about his work.
Basically he is trying to figure out a way that the processor can take advantage of multiple cores by itself. Currently software has to be made to take advantage of threading, and this is not any software. They are much harder to be programmed, debbuged, troubleshooted, maintained, updated, etc, etc. It can take most of the developing time of a software to optimize it to multi-threading.
If he, or anybody else working on such thesis, succeed developers can write their codes for single cores, like most people know how to do and the processor is then going to be responsible to dividing the work between his available cores and threads.
Your idea might not show very good results, because each core on a CPU already has somekind of threading (if I understand what you are thinking). And it's called pipelines, which greatly improves performance on programs with many loops or independent instructions, by executing instructions parallely. However a wider pipeline isn't necessarily better and can degrade performance because of many scenarios, for example program with many jumps.
Following up on threads, CPUs are already very powerful. They are so powerful that they spend most of their time idling. Even at load, they are usually doing more nothing than something. This happens because it needs to wait for other slower components, such as HDDs or even RAM, not to mention the physical distances between them. Since each CPU core already has some degree of threading, that is how technologies such as Intel's HT came into existence: if a CPU has most of it power on idle while it's under load, they took that idle power and made a non-physical core out of it, the only thing they needed to do this was to duplicate some physical components (to be more precise some registers on each core, even more precise, spend a few extra transistors for each core) while they shared most of their resoursures (basically BUSes and caches).
So, without any calculations, and only speculation based on what I know, I would say that your idea would actually result in a lower performance
For us, overclockers, if you wanted 4 faster cores, you could always disable some cores on your CPU to make it run cooler and bump the voltage a little higher on the ones remaining and go above those 4ghz