If you're bothering to run with HT on, then 8 threads. Otherwise it's a pointless test because it's not stressing the CPU to max, which is the whole point.
The only time you'd use prime with less than the maximum available workers is if you were testing an OC with Turbo Boost and C-states on. Since the Turbo won't activate with all 4 cores loaded. Not sure if you'd use 1 or 2 workers for a chip with HT, but I think that would be a rare OC anyhow (HT+Turbo+C-States)