Unfortunately it's a chicken and egg problem. This algorithm(although implemented through hardware logic) is simply branch prediction - and if you can characterize the hardware algorithm, you can manipulate the outcome. So then you write code that produces the outcome you desire - triggering branch prediction to load protected memory regions into a space which can be read by your program.
If you want to software patch this, it cost CPU cycles to ensure protected region cannot be loaded.
Or, you can disable the functionality.
Both reduce performance by hardware design, because you are disabling performance-enhancing features. That is all.