4x SSAO?
Do you mean "4x SSAA"? Because "SSAO" is Screen-Space Ambient Occlusion, which when implemented properly improves "realism" by mimicking how light causes varying degrees of reflectivity in an area (although most games just say "Look! Grass make shadow! We make dark edge! You pay me now!"), while "SSAA" is Super-sample Anti-Aliasing, which is essentially what you're doing with down-sampling except it occurs without the need for custom resolutions.
One BIG advantage I've found with using down-sampling, over SSAA whether 2x1/1x2, 2x2, 2x3/3x2, or 3x3, is that the performance hit is generally almost the same, OFTEN TIMES in favor of down-sampling.
However, SSAA is specifically there to smooth out jaggies, and it does a truly phenomenal job at this, because unlike the common forms of AA, SSAA is able to reduce "jaggies" in transparent textures (i.e. chain-link fence), INCREASE the visibility of something like an in-game overhead electrical wire that's far enough away to only take up a few (3-5?) pixels of screen space while also stopping the constant "shifting stairs" effect that comes from turning while keeping said object in your FOV, and has a very beneficial effect on things like plants (grass/trees) that look like crap otherwise (unless using AA methods equally, if not more, taxing than SSAA)...
Down-Sampling, on the other hand, basically takes everything that SSAA does, but doesn't focus only on aliasing, and instead improves the clarity/sharpness of EVERYTHING on screen. Alpha-to-Coverage, Texture LOD, Clarity of Characters/Objects at Distance, etc, etc, etc... It just simply all looks better.
I have found that the best image quality when down-sampling does NOT necessarily come from the highest possible pixel count, but rather the highest possible pixel count that results in EQUAL MAPPING of the full resolution image's pixels to the actual (down-sampled) display's pixels, such as down-sampling from 3840x2160 to 1920x1080, in which case there are exactly 4 "virtual" pixels for every 1 physical pixel in your monitor. The effect isn't HUGE, but once you notice it, it's certainly not subtle either. I don't have a full techno-babble explanation for you, but my ASSUMPTION is that it is because even-order pixel mapping, and sub-pixel mapping, allows for a significantly improved blending of color and more accurate chroma/luminance for that pixel; for example, if the 2160p image has a square of four pixels, and assuming 8-bit color let's say (clockwise from top-left) their colors on the 0-255 scale are 214, 208, 209, 213, then when the image is down-sampled an average of those four color numbers is taken (I know, it's more complicated than this, but for the sake of my fingers, and explanation), it is very likely more "accurate" than what color would be assigned to that pixel in native 1080p; in this case, down-sampling would give us 211, but native 1080p may very well decide that 214 or 208 is more appropriate, so while the former results in a smoother and less jarring transition, the latter may not. Also, chroma is blended further enhancing accuracy, or Delta-E, essentially.
The greater the pixel density of your monitor, or the greater the resolution assuming a common size standard (i.e. 1440p@27"; for 1080p though I really dislike anything larger than 21.5"), the less of an effect down-sampling will have on anti-aliasing, but the effects on textures, A-to-C, color mapping, and so forth can be significantly more profound because, quite simply, there are more pixels in the same/similar area that can be used to show the down-sampled image. Having ~110 pixels per inch, versus ~80 pixels per inch, means you have 37.5% more REAL pixels (and therefore, sub-pixels) with which to display the same imagery, allowing for more natural color gradations/transitions, improved texture details, finer levels of bump-mapping/tessellation, and also greater visual clarity at a given ("virtual") distance.
Other techniques have been developed that work exceptionally well for reducing aliasing, some for "traditional" aliasing, others focused on A-to-C, while still others are effective (to varying degrees) at both.
SGSSAA (Sparse-Grid Super-Sampling), Rotated-Grid Super-Sampling, CSAA, and combinations of different techniques such as using 2x2 SSAA + 4x MSAA, or 8x SGSSAA + 2x2 SSAA + 8x Transparency SSAA, so it's become much easier to find the best solution for YOUR needs.
*Note: Alpha-to-Coverage, above, also encompasses Transparency Anti-Aliasing
Multi-Sample Anti-Aliasing, however, has VERY FEW of the benefits of true AA (SSAA), only affecting some "jaggies" and only to a certain degree (to the point that I have yet to see it implemented well enough where, at its highest available level, adding SMAA/FXAA didn't offer still another dramatic decrease in aliasing...), and it has some fundamental flaw; using the same "power line a few pixels wide" example from before, which I chose for this reason, while SSAA/SGSSAA/etc AND post-processing AA like SMAA/FXAA will NOT reduce the size of said powerline and instead in many cases improve it through the blended pixels, MSAA on the other hand often causes sections of such examples of aliasing, to straight up disappear!!!
MSAA has a TREMENDOUS performance hit, and the loss of performance is not equal to the increase in quality, making it a rather useless form of anti-aliasing in my opinion. In fact, the only times I DO use MSAA (2-4x, typically), it is ALWAYS used in consort with SMAA or FXAA.
Here's a bunch'o'pictures of Black Mesa, taken while running the following: 3x3 Super-Sample AA + 8x MSAA + 16x SGSSAA + 16x CSAA
*(NOTE: Middle-click or "right-click then 'open in new tab'", in order to see full sized, which I highly advise.... ALSO, I tried to grab screens as varied as possible, in terms of lighting, location, and so on)