Overclock.net banner
1 - 16 of 16 Posts

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #1 ·
Recently i purchased a display with 125% sRGB color coverage for photo editing. Image quality also improved for watching movies so I dug a bit into settings of LAV filter and MPC-HC.

1. For this reason my preferred color output is sRGB
GPU and Display port are both set to Full RGB (RGB 4:4:4)

2. Blu Rays default color format is (up to) YUV 4:2:0 (YCrCb 4:2:0)
This format is much easier to compress.

Decoding it to RGB then should look like this:
And there should not be any losses on the way.

3. Unfortunately DXVA Hardware decoding uses as a default output NV12
There is added one additional (unnecessary) conversion and probably some loss of quality.

After trying different settings in LAV Filter, the only way how to avoid NV12 is to switch decoding to Software mode.

Are there any other ways how to use GPU decoding but avoid DXVA in LAV filter? Or to use RGB output instead of NV12?

So far I am running on Software decoding in LAVfilter.
 

·
Robotic Chemist
Joined
·
4,302 Posts
2. Blu Rays default color format is (up to) YUV 4:2:0 (YCrCb 4:2:0)
This format is much easier to compress.

3. Unfortunately DXVA Hardware decoding uses as a default output NV12
There is added one additional (unnecessary) conversion and probably some loss of quality.
Converting YCbCr 4:2:0, as found on bluray, to NV12 is 100% mathematically lossless. It is just a different way of organizing the data in memory (semi-planar). NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values. NV12 has a half width and half height chroma channel, and therefore is a 420 subsampling.

Also, a display with 125% sRGB sent sRGB color data will look oversaturated. The display does not covert the color data, so instead of 100% red you are getting 125% red. You are not getting correct colors, but you may think it looks better because people often like oversaturated colors. If you do want correct color look into measuring your display's native color space with a colorimeter (e.g. X-Rite i1 Display Pro Plus, EODIS3PL) and creating a 3DLUT for madVR to convert the color data to the native color space of your display. DisplayCal is great free software to do the measurements and create the 3DLUT, but you still need a meter.

Nit-pick: YCbCr 4:2:0 is NOT easier to compress, it is just naturally half the bandwidth when uncompressed due to using 1/4 the number of pixels for both the color planes. It is an ancient compression technique to allow higher resolution black and white data over limited bandwidth, before technologies like DSC or H.264/H.265 existed. It has NO compression advantage with modern codecs. A modern codec would compress the color data to the same size with higher quality if we did not use 4:2:0 sampling. That UHD bluray and streaming video is still using 4:2:0 is a travesty. :mad:

Edit:
YCbCr is easier to compress than RGB, but the subsampling does not help. It just causes a loss of chroma resolution for zero gain.
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #3 ·
Converting YCbCr 4:2:0, as found on bluray, to NV12 is 100% mathematically lossless. It is just a different way of organizing the data in memory (semi-planar). NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values. NV12 has a half width and half height chroma channel, and therefore is a 420 subsampling.

Also, a display with 125% sRGB sent sRGB color data will look oversaturated. The display does not covert the color data, so instead of 100% red you are getting 125% red. You are not getting correct colors, but you may think it looks better because people often like oversaturated colors. If you do want correct color look into measuring your display's native color space with a colorimeter (e.g. X-Rite i1 Display Pro Plus, EODIS3PL) and creating a 3DLUT for madVR to convert the color data to the native color space of your display. DisplayCal is great free software to do the measurements and create the 3DLUT, but you still need a meter.

Nit-pick: YCbCr 4:2:0 is NOT easier to compress, it is just naturally half the bandwidth when uncompressed due to using 1/4 the number of pixels for both the color planes. It is an ancient compression technique to allow higher resolution black and white data over limited bandwidth, before technologies like DSC or H.264/H.265 existed. It has NO compression advantage with modern codecs. A modern codec would compress the color data to the same size with higher quality if we did not use 4:2:0 sampling. That UHD bluray and streaming video is still using 4:2:0 is a travesty. :mad:

Edit:
YCbCr is easier to compress than RGB, but the subsampling does not help. It just causes a loss of chroma resolution for zero gain.
Display is already calibrated for BT.701 2.20 using sensor i1 Display Pro and the profile is stored in display in GLUT format. Mostly its applied on hardware level, but the calibration software delivered with the monitor creates also ICC profile. Its best to use both of them.

I had two concerns regarding y420>NV12>RGB conversion.
1. One step is redundant and can be skipped.
This was more of a concern in a past since this conversion crippled performance of a TV Tuner capture.


2. y420 is listed as a 16bit format, NV12 as a 12bit at least here:
But yes, its also 420 subsampling so in theory there should not be any loss.
 

·
Robotic Chemist
Joined
·
4,302 Posts
Display is already calibrated for BT.701 2.20 using sensor i1 Display Pro and the profile is stored in display in GLUT format. Mostly its applied on hardware level, but the calibration software delivered with the monitor creates also ICC profile. Its best to use both of them.
Great! :)

(BT.709, I assume) ;)

1. One step is redundant and can be skipped.
This was more of a concern in a past since this conversion crippled performance of a TV Tuner capture.
It cannot be skipped, that is the only format Nvidia GPUs use for YCbCr 4:2:0 data in Windows. The conversion should be really fast, I am not sure why it would cripple a TV Tuner, but they have notoriously wonky drivers, so anything makes sense.

2. y420 is listed as a 16bit format, NV12 as a 12bit at least here:
But yes, its also 420 subsampling so in theory there should not be any loss.
I am not sure what format you looked up for "y420" but the data on a bluray is usually decoded to i420p8 or YV12, which is 12 bits per pixel. This is 8 bits for Y and 8 bits for U and 8 bits for V, but there are only 1 U and V sample for 4 Y pixels, so we divide the 8+8 by 4 and add that to the bits for Y, getting 12 bits/pixel.

YUY2 or 8 bit YUV 4:2:2 is 16 bits per pixel, using the same math but dividing by two instead of four.

It is not "in theory", they really are mathematically identical. The "conversion" is simply copying values to different positions in memory. :p

Edit: Notice how the "4:2:0 Formats, 16 bits per pixel" (IMC1 and IMC3) have "Padding" or "Space" in their defined memory layout? They are really 12 bits per pixel, with 4 pits of padding per pixel to change memory alignment.
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #5 · (Edited)
Great! :)

(BT.709, I assume) ;)
Yes

It cannot be skipped, that is the only format Nvidia GPUs understand. The conversion should be really fast, I am not sure why it would cripple a TV Tuner, but they have notoriously wonky drivers, so anything makes sense.
I have AMD and if I set LAV filter to Software mode, it simply converts the stream to whatever I specify in its config - NV12 does not get forced. If using DXVA HW decoding, it always prepare multiple stream outputs - NV12 in every case + the ones I specified, but NV12 gets selected all the time.

Multiple conversions at a time can be problem on a HTPC systems, built on an old hardware (a convenient way how to use very old hardware in a meaningful way). Like you can run Windows 7 on a Pentium III and attempts to play a bluray might cause trouble. Solution is sometimes to use RGB24 output instead of NV12 (or any other), simply to get the image in native desktop colorspace and to use exactly ONE conversion.


I am not sure what format you looked up for "y420" but the data on a bluray is usually decoded to i420p8 or YV12, which is 12 bits per pixel. This is 8 bits for Y and 8 bits for U and 8 bits for V, but there are only 1 U and V sample for 4 Y pixels, so we divide the 8+8 by 4 and add that to the bits for Y, getting 12 bits/pixel.

YUY2 or 8 bit YUV 4:2:2 is 16 bits per pixel, using the same math but dividing by two instead of four.

It is not "in theory", they really are mathematically identical. The "conversion" is simply copying values to different positions in memory. :p

Edit: Notice how the "4:2:0 Formats, 16 bits per pixel" (IMC1 and IMC3) have "Padding" or "Space" in their defined memory layout? They are really 12 bits per pixel, with 4 pits of padding per pixel to change memory alignment.
I went through multiple disks. Sometimes stream is reported to be yuv420p, sometimes just y420, but to get better idea, MPC HC can report pin outputs and inputs. Input for the video decoder says:


BITMAPINFOHEADER:
biSize: 40
biWidth: 1920
biHeight: 1080
biPlanes: 1
biBitCount: 12
biCompression: H264
biSizeImage: 3110400
biXPelsPerMeter: 0
biYPelsPerMeter: 0
biClrUsed: 0
biClrImportant: 0

Bit count confirms what you say. Anyway, Now I enabled newer 0.74 version of LAV filter. It allows me to bypass DXVA and use D3D11 for hardware decoding, without forced NV12.
 

·
Robotic Chemist
Joined
·
4,302 Posts
Multiple conversions at a time can be problem on a HTPC systems, built on an old hardware (a convenient way how to use very old hardware in a meaningful way).
If you use DXVA2 or D3D11 hardware decoding, NV12 is the native format. The video is decoded directly to NV12 in memory on the GPU. No conversion is done to get NV12.

If you want to display the video data using GPU hardware decoding sending the renderer NV12 is fastest method available, with the least conversions and memory copies. AMD or Nvidia or Intel and Windows 7/8/10.

I went through multiple disks. Sometimes stream is reported to be yuv420p, sometimes just y420, but to get better idea, MPC HC can report pin outputs and inputs. Input
All video on consumer optical media is YCbCr 4:2:0 video data after decoding, which can be natively stored (without conversion) as IMC1, IMC2, IMC3, IMC4, YV12, or NV12 depending on how exactly the decoder organizes the output data in memory as it decodes. They are all identical mathematically, with different advantages mostly depending on the optimal memory access patterns of the hardware and software. The codes yuv420p (or y420) are describing a different thing than NV12 or YV12 are. Those yuv420 codes are describing the kind of pixel data it is, meaning 8 bit YCbCr with 4:2:0 chroma sampling, while NV12 and YV12, etc. are describing how the values for that 8 bit YCbCr with 4:2:0 chroma sampling pixel data are stored in memory.

The "native" format of the video data on the disc, as encoded as MPEG2, H.264, or HEVC (DVD, bluray, UHD bluray) is crazy to think about. A block based transform to frequency space, followed by quantization, motion based predictions based on reference blocks, and residual blocks. Then all that is compressed again. It is decoded to YCbCr 4:2:0 pixels, but the decoder can organize it anyway it wants in memory without needing any additional conversions or memory copies. Converting NV12 to YV12 is not like converting YCbCr to RGB, it is literally just copying data to different memory addresses, and it has to be copied into memory when decoded in the first place.

Anyway, Now I enabled newer 0.74 version of LAV filter. It allows me to bypass DXVA and use D3D11 for hardware decoding, without forced NV12.
D3D11 hardware decoding DOES decode to NV12, it is just that MPC-HC reports it differently when using D3D11.
Font Line Wood Screenshot Recipe


What are you using as the renderer? Have you used madVR? madVR is ideal if you want a high quality YUV -> RGB conversion. LAV Video does not use the highest quality method to convert YCbCr 4:2:0 to RGB, it was implemented long ago as a fail-safe in case the renderer does not accept the video's native format, but normally it is not used so quality wasn't considered too important (all renderers used today accept the native unconverted YCbCr 4:2:0 and do their own, usually higher quality, conversion to RGB).

LAV's YUV -> RGB isn't terrible or anything, but you are only making the process slower and lower quality by forcing LAV to output RGB24. :p

From Recommended 8-Bit YUV Formats for Video Rendering for NV12:
"All of the Y samples appear first in memory as an array of unsigned char values with an even number of lines. The Y plane is followed immediately by an array of unsigned char values that contains packed U (Cb) and V (Cr) samples. When the combined U-V array is addressed as an array of little-endian WORD values, the LSBs contain the U values, and the MSBs contain the V values. NV12 is the preferred 4:2:0 pixel format for DirectX VA. It is expected to be an intermediate-term requirement for DirectX VA accelerators supporting 4:2:0 video."
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #7 ·
I guess you let madVR set to handle chroma or trust DXVA at some point. My output looks like this. I used madVR because MPC indicates lowest sync offset and no jitter at all, while framedrops are reported only when resizing the window.

Renderer isnt apparently even aware of the original format, but the pin in LAV filter has the data about H264. Also i went for copyback manual selection, since there is a possibility i might add some devices to PC which are a Hardware video decoder, but i dont want them to be used.

Now i just wonder why it uses D3D9 instead of 11.

Font Screenshot Software Adaptation Technology


I will be looking for lowest clock deviation, lowest frame repeat
 

·
Robotic Chemist
Joined
·
4,302 Posts
I guess you let madVR set to handle chroma or trust DXVA at some point. My output looks like this. I used madVR because MPC indicates lowest sync offset and no jitter at all, while framedrops are reported only when resizing the window.
madVR for YCbCr 4:2:0 to RGB is much better quality than LAV. High quality YCbCr 4:2:0 to RGB is what madVR was originally written for. Your settings make me sad. :(

Check all the color spaces for LAV (except AYUV) so you don't have any slow, low quality, color space conversions before sending the video data to madVR. :p

Also, I suggest disabling most of the trade quality for performance options in madVR, especially "use DXVA chroma upscaling when doing native DXVA decoding" if you do use native DXVA2 or D3D11 hardware decoding in LAV.
Font Screenshot Material property Parallel Number


Renderer isnt apparently even aware of the original format, but the pin in LAV filter has the data about H264. Also i went for copyback manual selection, since there is a possibility i might add some devices to PC which are a Hardware video decoder, but i dont want them to be used.
I do prefer copyback. A few of madVR's features require copyback too (IVTC, black bar detection) because they are done on the CPU. Copyback is slower because the decoded video data is copied from the GPU's memory to the CPU's memory and then back, but that way CPU algorithms can process the video data.

Automatic will pick the GPU the display is pugged into if there are multiple options and the video data is only processed on the GPU.

Now i just wonder why it uses D3D9 instead of 11.
That is a madVR setting:
Product Rectangle Font Screenshot Software

Rectangle Font Screenshot Software Parallel


Present several frames in advance is a newer (not very new at this point) way to present frames, if you uncheck that box you get the "old path" D3D9 using backbuffers instead of presenting frames in advance. This is not better, at least I have never seen it be better on Windows 10. It is ignored if you are using D3D11.

I will be looking for lowest clock deviation, lowest frame repeat
Have you tried madVR's custom modes feature to optimize your display mode for minimal frame drop/repeats? It can be temperamental, and you might need to use CRU to actually define the mode, but the optimization data can suggest new, more accurate, timing options. Watch a video undisturbed (no seeking, pausing, etc.) for at least 10 minutes before using the optimization data, the longer the more accurate, to a point. This has worked really well for me (my screenshot above was not using a tuned display mode).
Rectangle Font Line Material property Parallel
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #9 ·
madVR for YCbCr 4:2:0 to RGB is much better quality than LAV. High quality YCbCr 4:2:0 to RGB is what madVR was originally written for. Your settings make me sad. :(

Check all the color spaces for LAV (except AYUV) so you don't have any slow, low quality, color space conversions before sending the video data to madVR. :p

Also, I suggest disabling most of the trade quality for performance options in madVR, especially "use DXVA chroma upscaling when doing native DXVA decoding" if you do use native DXVA2 or D3D11 hardware decoding in LAV.
I use MPC-HC only for playing BluRays. Many reasons, but this player allows me to play sound in higher quality, than other players. So, in my situation I can afford to check only 4:2:0. Benchmarking and few articles says that YV12 is more similar to source stream - yuv420p.

(a side note, rendering in various color outputs takes slightly different time per frame:
YV12= 0.94ms
NV12= 1.10ms
RGB24= 1.20ms)

Therefore I can choose between two different approaches.

a) LAV will be used only for decoding.
I will set the output to closest format to yuv420p, and it will be one of the 4:2:0.

Afterwards I can access a lot of settings related to YUV chroma conversion. Using these techniques however increase render time up to 20-40 miliseconds.

Then the image will be converted to RGB.

b) LAV will do all the work, and the output will be RGB24 with color range as the source (mostly limited range)
YUV>RGB will happen here and just once in the chain, no upscaling will be used.

In both cases I can use madVR to get better sync between movies and display. The only upscaling I use is 1920x1080 to 3840x2160, which is relatively simple one, and i prefer to play blurays as close to the source content as possible.
 

·
Robotic Chemist
Joined
·
4,302 Posts
b) LAV will do all the work, and the output will be RGB24 with color range as the source (mostly limited range)
YUV>RGB will happen here and just once in the chain, no upscaling will be used.
LAV does scale the chroma data, you have to scale the chroma data somehow to convert YCbCr 4:2:0 to RGB. LAV doesn't care too much about how the scaling is done either, quick and dirty is fine. This is from the developer of LAV filters, they say to not use LAV's conversion to RGB unless your renderer cannot accept the video's native format.

Even if you do want to use LAV to convert to RGB, do not send RGB24 as limited to madVR or madVR will need to do another conversion to full range. The difference between the conversion from limited YCbCr to full RGB or limited YCbCr to limited RGB is only the matrix used, both are done as a single operation. It is better to do one conversion using the limited YCbCr -> full RGB matrix than converting to limited RGB and then scaling that to full RGB.

LAV Filters and madVR are designed to work together, the two developers have worked together over the years to develop an ideal video rendering pathway, with the fewest memory copies or format conversions possible. Doing what should be madVR's job in LAV Video is misunderstanding how all this is designed to work.

In both cases I can use madVR to get better sync between movies and display. The only upscaling I use is 1920x1080 to 3840x2160, which is relatively simple one, and i prefer to play blurays as close to the source content as possible.
Sounds reasonable. I try to play blurays as close to the source as I can too. I don't use any pre/post processing and I use as natural scaling as possible for both chroma and image scaling.

Edit:
I will set the output to closest format to yuv420p, and it will be one of the 4:2:0.
"closest format to yuv420p" :rolleyes:

No, it is exactly the same format with absolutely ZERO conversions going on. madVR gets the pure, exactly as decoded, video data from LAV Video if you let LAV output all the color formats. NV12 is a more specific name for yuv420p, it is yuv420p. :mad:

If you think NV12 is not identical to yuv420p then please describe in what way they differ.

You see "dog" in one place and "labrador" in another and are thinking the animal must have changed. :p
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #11 · (Edited)
Went through more default settings in LAV, used DXVA2 in Native mode. In madVR i used D3D9 with new path and separate device for presentation. For me it produces no glitches, and least amount of dropped frames. Rendering is as well the fastest. In this setting i switched DXVA for Chroma and image upscaling as well.

In this configuration the render time was absolutely lowest 0,6-0,7ms. Image quality viewed on 4k display was definitely better as with RGB24 with bilinear upscaling. Also there was little bit less banding.

All this at about 15% usage on Video Decode and 7% on Shader utilization.

Customized setitngs, in copy-back mode NV12 or YV12 and bilinear upscale for Chroma, 1.10ms rendering time at exactly same GPU resource consumption
Switching to NGU Standard (high) for both Image and Chroma introduces 25 percent of GPU shader utilization in windowed mode (1920x1080), up to 65 percent in fullscreen. Also it introduced new artifacts of its own.

Same setting with Jinc AR for chroma and image upsampling with antiringing filter gained better results than DXVA, NGU and, with much less GPU utilization (up to 10%), no extra artifacts, but rendering time went up.

RGB24 with Jinc AR for image resize produced as sharp image, however the banding artifacts looked different. Sometimes completely invisible, sometimes moving blobs of pixels. yeah, thats how might fast conversion from YUV to RGB look like.

So while using YUV postprocessing just for Chroma makes sense, however it takes more resources. At least for my display such approach makes sense.


Some sidenotes:
  • No support for freesync/Gsync. Having it would be nice, but that technology came later.
  • It Did not get surprised by ancient DVDs with PAL compatible content. I mean it recognize 720x576, BT.601, with limited range and interlacing, did not got fooled with DVD where stream was de-interlaced, it did not turned it on "just because". Actually it were more recent DVDs which were so well copy-protected that they were impossible to be played.
  • It did not went into oversaturation or to excessive high contrast while both approaches use to make banding worse.

I will re-test D3D11, both VA and renderer.

One thing regarding audiofilter - it would be nice to prioritize bluray_PCM over DTS-HD-MA/TrueHD, or Stereo over 5.1 and vice versa in a similar manner as languages can be. I was quite happy to find DTS HD MA and True HD and not just AC3 on BDs, but to find complete LPCM in 48Khz/24bit stereo is really nice surprise.

Edit: D3D11 in native mode crashes when goes to fullscreen. No idea why.
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #12 ·
"closest format to yuv420p" :rolleyes:

No, it is exactly the same format with absolutely ZERO conversions going on. madVR gets the pure, exactly as decoded, video data from LAV Video if you let LAV output all the color formats. NV12 is a more specific name for yuv420p, it is yuv420p. :mad:

If you think NV12 is not identical to yuv420p then please describe in what way they differ.

You see "dog" in one place and "labrador" in another and are thinking the animal must have changed. :p
Basically just the data alignment. YYYY UU VV in in the stream, NV12 is YYYY UVUV and YV12 is YYYY VV UU.

Anyway, I found out that using madVR in a way you suggested (keeping YUV in LAV) works better in terms of quality, while the cost of performance or power consumption is negligible.

Somewhat important thing for me was a setup where audio/image clock deviation between madVR and Internal audio renderer in MPC-HC is -0.0009% of fps (some 0.3753 microseconds), while this rate is constant. I was expecting jitter or delay in miliseconds, and the results are beyond wildest expectations...
 

·
Robotic Chemist
Joined
·
4,302 Posts
Basically just the data alignment. YYYY UU VV in in the stream, NV12 is YYYY UVUV and YV12 is YYYY VV UU.
Is it YYYY UU VV in the stream? I have no idea what it is decoded into natively, if that is really a thing. Is there really an "in the stream" memory layout? On the disc or as streamed from Netflix it is not YYYY UU VV, discrete spatial pixels only exist after decoding. I think the hardware decoder on GPUs natively outputs NV12, at least as native as we have access to.

They all turn into exactly the same RGB anyway. ;)

Anyway, I found out that using madVR in a way you suggested (keeping YUV in LAV) works better in terms of quality, while the cost of performance or power consumption is negligible.
It sounds like you have found good settings for your system.

Somewhat important thing for me was a setup where audio/image clock deviation between madVR and Internal audio renderer in MPC-HC is -0.0009% of fps (some 0.3753 microseconds), while this rate is constant. I was expecting jitter or delay in miliseconds, and the results are beyond wildest expectations...
Now your setup is about as ideal as we can get on Windows. Congratulations. :D
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #14 ·
Display LG 27GN950 appears to unofficially support 23,976Hz, 24 and 25Hz. I would found support of Freesync/Gsync in Direct3d Exlusive mode to be easier to configure and more precise, but its not surprising that Gsync/Freesync display is capable of these modes...

Just one question... Composition rate 23Hz, Display 23,97603Hz, movie 23,976fps... I guess the composition rate just refers to the "name" of the display mode which is listed as 23Hz.
 

·
Tank destroyer and a god
Joined
·
2,769 Posts
Discussion Starter · #15 ·
Replaced USB cable on BD drive and miracle happened - check clock deviation.

Later i found out that using LAN card gave different clock deviation than Wifi, so i checked the settings and found out that LAN was misconfigured :D.
Screenshot Font Adaptation Software Technology
 

·
Robotic Chemist
Joined
·
4,302 Posts
Just one question... Composition rate 23Hz, Display 23,97603Hz, movie 23,976fps... I guess the composition rate just refers to the "name" of the display mode which is listed as 23Hz.
I normally see the correct composition rate:
Organism Font Terrestrial plant Photo caption Monochrome photography


I don't like seeing a composition rate of 23.000Hz, that can cause judder or other weirdness. Depending on what else, I am not sure. :p

Replaced USB cable on BD drive and miracle happened - check clock deviation.
Nice. Once drop/repeat is above the length of any movie better won't actually change anything, but near perfection is still satisfying. :)
 
1 - 16 of 16 Posts
Top