Historically, GPUs most easily get faster by jamming more transistors into them. As you may already know, GPUs are designed for massively parallel workloads and as such, have grown to be powerful masses of thousands of shader cores with hundreds of texture-mapping units (TMUs) and dozens of render output pipelines (ROPs) at the top-end. More advanced process nodes (lower nm) are crucial to being able to fit more transistors and therefore resulting shaders/TMUs/ROPs/etc. in at a given die size and/or power consumption target. The "nm/nanometer" spec of a given process node represents the size of an individual transistor (or at least, it's supposed to; marketing sometimes exaggerates).
Therefore getting to smaller process nodes and making transistors smaller lowers the amount of power and die space they each consume, allowing you to fit more in at a given die size and power usage target. The upper limit on the potential die size of a GPU is determined by a foundry's (like TSMC, Global Foundries, Intel, Samsung, etc.) reticle limit. The reticle limit is the maximum size of the photomask used to create an imprint of the microprocessor design into silicon. Basically, it's like the equivalent of a cookie-cutter for processors and depending on the foundry creating the processors and their specific process node, the reticle limit can vary.
Nvidia in particular use TSMC to create their GPUs. Nvidia are responsible for coming up with the designs and TSMC are responsible for actually putting them on silicon and creating working processors. The current reticle limit for TSMC is, if I recall correctly, ~600 mm^2 per GPU on 28nm. GM200 (980 Ti and Titan X) is 601 mm^2 putting it right at the die size limit of what TSMC can currently manufacture (also making GM200 the largest consumer GPU ever manufactured). Therefore yes, there is a certain die size/transistor limit on what Nvidia and TSMC can currently do without moving to a new process node. There are therefore two primary concerns (judging by your OP) to address: Is there anything Nvidia can do while staying on the same process node/die size to improve performance and is it possible to dramatically increase the reticle limit for a given process node and allow monstrously large professional chips to exist?
1. Can Nvidia do more while staying on the same process node/die size to improve performance?
Comparing GM200 (980 Ti/Titan X) to GK110 (780/780 Ti/Titan/Titan Black): both are on 28nm and GM200 has a die size of 601 mm^2 with 8.0 billion transistors which is only about ~10% larger/more than GK110 with a die size of 561 mm^2 and 7.1 billion transistors. However, GM200 performs more like ~50-60% (estimation off the top of my head) better than GK110 in games and many other consumer applications. This shows that Nvidia found ways to improve performance purely through architecture, the way they organize and use their limited number of transistors in the GPU to increase efficiency. This did, however, come at a cost. Nvidia had to effectively cut out double-precision compute performance (which is useless to consumers, but useful in certain professional/scientific fields) and create Maxwell with a more focused target market than Kepler. They did make some very real advancements in efficient transistor usage, as well, such as the restructuring of the Kepler SMX into the Maxwell SMM and further development of lossless color compression.
This would suggest that, when push comes to shove and Nvidia have no other option, they can do things the harder way and squeeze out performance through more efficient usage of transistors as well as tailoring new architectures to very specific areas. For example, they can take a look at the current demands of one particular set of consumers (like gamers) and the demands of current gaming software, then create an architecture specifically designed to be more efficient at those things. They can do the same for professional markets and create a focused design for double-precision floating-point performance that would be terrible for gaming. This method clearly has its limits however. Things can only get so efficient before you need more die space and lower-power transistors to add more features or processing units.
2. Can the reticle size, and therefore die size, at a certain process node be increased and create monster chips?
The short answer is yes. Theoretically, TSMC or whoever else could do some redesigning and end up increasing their reticle limit significantly which could allow Nvidia to take even their current Maxwell architecture (or an evolution) and scale it up into a monster-sized GPU (well in excess of 601 mm^2) even on 28nm. There are significant drawbacks to this, however. As the die size increases, the potential for chip defects increases as well. Additionally, power consumption would be through the roof on this theoretical chip and certain limits of physics may mean the design can't clock particularly high. The additional yield issues (more defective chips, therefore more scrapped chips) plus power/cooling requirements would make this option very expensive, but theoretically possible. I can't say what the limits to doing this would be, however.
I can't tell you whether it would be enough for smooth, real-time ray tracing, but I'd guess 5x a Titan X wouldn't be enough depending on how much ray tracing you want to do. Even if it were possible on a theoretical professional-only monster GPU (and only at a relatively low FPS and resolution), that wouldn't be nearly enough of a target market for ray tracing to become commonplace. We're looking at many years before games will do real-time ray tracing, imo.
Edited by Serandur - 10/18/15 at 8:00pm