Having a "cluster" of cores by having a bunch of the same chip in a room and having a cluster of cores on the actual die are two completely different things. As the guy above me said, how would those Pentium's be connected? Point being that the architecture itself -- how efficient it is, how it's synthesized, directly correlates to how many cores they could fit onto the die itself, thereby directly affecting performance in the end. The original Bulldozer design was scrapped because despite it "working", it was way slower than what they already had, thus it got delayed for a long time. I also heard it was a power guzzler too, moreso than the Zambezi we ended up getting. If the uarch isn't efficient enough, they couldn't fit as many cores because it'd use up too much power, therefore they'd have to drop the amount of cores to have a decent balance. Then there's yields to take into consideration, etc.
So yes, core count and therefore parallelism can be directly affected by the architecture's power/space efficiency. It's never as simple as just throwing more cores onto a die and calling it a day. L3 cache alone on these chips takes up almost half the die space, adding another module means even more L3, which means an even more massive die. That would lead to more expensive wafers and a higher possibility of yield issues, all things that wouldn't be worth it at all.