Overclock.net › Forums › Software, Programming and Coding › Operating Systems › Windows › Windows XP Ram Limit
New Posts  All Forums:Forum Nav:

Windows XP Ram Limit - Page 5

post #41 of 100
Pretty good stuff guys!

@friendship7: Thanks for the patched hal! Much appreciated.

I tested it and the hal works as expected on XP SP3. The USB problems are still present of course e.g. if I do a quick format of an USB stick in Windows Explorer it hangs.
post #42 of 100
You welcome kondra,
The format window hangs, right? I've seen it now for the first time.
perhaps there are other locations that needs patching...
post #43 of 100
It's noteworthy that I get the USB issue mentioned above only when connecting to USB 2.0 ports,
When connecting to USB 3.0 ports, I can quick-format without any issue.
The AMD USB 3.0 ports have separate drivers, which make me suspect that the issue is with the in-box XP SP3 USB 2.0 drivers.
(in addition to the fact that no other storage medium shows this issues AFAIK)
post #44 of 100
I was able to narrow down the USB 2.0 issues with XP SP3 to usbport.sys:
Taking usbport.sys from Server 2003 SP2 solves my USB 2.0 format issues with XP SP3.

(taking usbehci.sys+usbohci.sys+usbport.sys from XP SP3 will create the same USB 2.0 issue in Server 2003 SP2)
post #45 of 100
Absolutely correct! The noob (me) has not recognized that the USB problem only shows on USB 2.0 ports and not 3.0 ones. Good work!
I have tested usbport.sys from Windows 2003 Server with SP2 and it works great on my XP SP3.

For the sake of completeness we have to do the following on XP SP3:
- patch ntkrpamp.exe at offset 0x1B2A51 from 75 1B to 90 90
- patch ntkrpamp.exe at offset 0x15DF1A from 75 1B to 90 90
- correct the checksum of ntkrpamp.exe with LordPE
- patch halmacpi.dll at offset 0x17813 from 74 17 to EB 17
- correct the checksum of halmacpi.dll with LordPE
- to replace the original files by the patched ones we have to do the following:
- rename the file "C:\Windows\Driver Cache\i386\driver.cab" to "driver.cab_"
- rename the file "C:\Windows\Driver Cache\i386\sp3.cab" to "sp3.cab_"
- rename the file "C:\Windows\system32\ntkrnlpa.exe" to "ntkrnlpa.exe_"
- cancel the "Windows File Protection" message box and choose "Yes"
- copy the patched file ntkrpamp.exe to "C:\Windows\system32\ntkrnlpa.exe"
- cancel the "Windows File Protection" message box and choose "Yes"
- rename the file "C:\Windows\system32\hal.dll" to "hal.dll_"
- copy the patched file halmacpi.dll to "C:\Windows\system32\hal.dll"
- rename the file "C:\Windows\system32\drivers\usbport.sys" to "usbport.sys_"
- copy usbport.sys from Windows Sever 2003 SP2 to "C:\Windows\system32\drivers\usbport.sys"
- rename the file "C:\Windows\Driver Cache\i386\driver.cab_" to "driver.cab"
- rename the file "C:\Windows\Driver Cache\i386\sp3.cab_" to "sp3.cab"
- reboot

Hint: The german and english versions of halmacpi.dll and usbport.sys are the same. Only ntkrpamp.exe has minimal differences.

@friendship7: Much respect for very good reversing skills!

Thanks to all involved!
post #46 of 100
The amount of misinformation in the first couple of pages has led me to believe a lot of what remains is going to follow suit.

Here's the skinny on memory address range limitations regardless of operating system (with some Windows specific notes, since that's the topic at hand):

1.) 32-bit memory address ranges are limited to what is referred to as the "3GB Barrier" this is a pseudo-barrier of 32-bit memory address range limitations, and actually varies generally from 2.75GB to 3.5GB in size, depending on the total size of other memory ranges [most commonly, it is reduced from 3.75gb to much lower because of GPU memory). In Windows XP this issue is compounded due to the use of a "paged pool" of memory which isn't seen to other applications ever (it's not available for allocation, so its never listed as such, its reserved for the kernel to work with paging, it is exactly 384mb in size.)

2.) In addition to the above kernel limitations, there is another limitation that is directly imposed upon applications: Each application is allowed to allocate up to 2GB of user space memory, unless the IMAGE_FILE_LARGE_ADDRESS_AWARE ("LAA") flag is set, which is commonly hacked onto games to prevent them from crashing with extensive modifications (Skyrim, for example). Note that this limitation is per-thread and also applies to graphics card memory. Also note that while it is a per-thread limitation, each "image" can only hold 3GB of memory using LAA, or 2GB without. This gets confusing to explain, because I don't really understand where you are able to use additional memory on the GPU and where you run into the wall again.

3.) The above limitations presume that something called Physical Address Extension (PAE for short) is not present. In both Windows XP land and Linux, Unix, BSD land - Physical Address Extension IS technically supported. With a PAE enabled kernel (Windows XP had a PAE kernel originally on the Professional edition, but this was scrapped and saved for Windows Server 2003, which is just Windows XP at its core) the memory address range limitation is lifted 64GB. This does NOT directly affect the image limitations imposed by a 32-bit operating space, however using LAA, and PAE in conjunction with multiple threads attached to a single memory management thread applications were able to allocate many times the original limitations. PAE's extended memory ranges are achieved effectively by the kernel managing multiple separate memory ranges in parallel, each one being 4GB in size. That's an overly simplified way of looking at it, but its good for explination purposes.

So on to the question at hand:
Why?
Simple;
2 ^ 32 = 4,294,967,296 = exactly 4GB.

So in Windows XP:

4,294,967,296 - 402,653,184 (384mb) = 3,892,314,112 (3.625GB of allocatable memory for applications)

3,892,314,112 - 536,870,912 (512mb, typical vram at the time) = 3,355,443,200 (3200mb / 3.2gb of ram) <-- That magical "3gb limit" rears its ugly head.

Things like sound controller memory and such also detract from the value, even chipset memory and other small memory patches bring that number lower.
    
CPUMotherboardGraphicsRAM
Core i7 920 D0 4.2ghz HT (1.3625v) Asus R3E 2xGTX 460 (non SLi, no overclock) 6x2gb G.skill @ 6-8-6-24-1T 
Hard DriveOptical DriveOSMonitor
WD-VR 300GBx1, 2xWD 1tb,2x60gb Agility Some crappy combo burner... Arch x64 3xDell U2410f rev A02 
KeyboardPowerCaseMouse
X-Armor U9BL TT Toughpower 1200w (NTB more efficient) Mountain Mods Pinnacle 24 CYO Roccat Kone (R.I.P. A4Tech x7) 
Mouse Pad
Steelpad Experience I-1 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
Core i7 920 D0 4.2ghz HT (1.3625v) Asus R3E 2xGTX 460 (non SLi, no overclock) 6x2gb G.skill @ 6-8-6-24-1T 
Hard DriveOptical DriveOSMonitor
WD-VR 300GBx1, 2xWD 1tb,2x60gb Agility Some crappy combo burner... Arch x64 3xDell U2410f rev A02 
KeyboardPowerCaseMouse
X-Armor U9BL TT Toughpower 1200w (NTB more efficient) Mountain Mods Pinnacle 24 CYO Roccat Kone (R.I.P. A4Tech x7) 
Mouse Pad
Steelpad Experience I-1 
  hide details  
Reply
post #47 of 100
FYI I did a benchmark with FillrateTest 1.13 to check whether a decrease in performance could be measured for DMA transfers due to unnecessary double buffering with the modified HAL. I compared those 3 configurations on the same hardware (8GB RAM, NVIDIA Quadro FX 1400, Geforce Driver 307.83)
1.) Win XP SP3 32bit without any modification
2.) kondras patched kernel + outdated SP1 halmacpi
3.) friendship7 patched kernel and patched sp3 halmacpi

Result: There are no performance differences between the 3 setups showing (with this kind of benchmark and this driver)

I ran each test 2 times to get an idea of deviation on consecutive runs (these are very small, I post the results of both runs nevertheless)
1.) run1
Memory bandwidth (0 + 32/32): 18362 MB/sec
Memory bandwidth (16 + 32/32): 16871 MB/sec
Memory bandwidth (32 + 32/32): 16870 MB/sec
GPU fill rate, single-texture (16/0): 2304 Mtexels/sec
GPU fill rate, multi-texture (16/0): 2446 Mtexels/sec
1.) run2
Memory bandwidth (0 + 32/32): 18383 MB/sec
Memory bandwidth (16 + 32/32): 16878 MB/sec
Memory bandwidth (32 + 32/32): 16882 MB/sec
GPU fill rate, single-texture (16/0): 2304 Mtexels/sec
GPU fill rate, multi-texture (16/0): 2446 Mtexels/sec

2.) run1
Memory bandwidth (0 + 32/32): 18382 MB/sec
Memory bandwidth (16 + 32/32): 16871 MB/sec
Memory bandwidth (32 + 32/32): 16877 MB/sec
GPU fill rate, single-texture (16/0): 2304 Mtexels/sec
GPU fill rate, multi-texture (16/0): 2445 Mtexels/sec
2.) run2
Memory bandwidth (0 + 32/32): 18400 MB/sec
Memory bandwidth (16 + 32/32): 16897 MB/sec
Memory bandwidth (32 + 32/32): 16869 MB/sec
GPU fill rate, single-texture (16/0): 2304 Mtexels/sec
GPU fill rate, multi-texture (16/0): 2446 Mtexels/sec

3.) run1
Memory bandwidth (0 + 32/32): 18376 MB/sec
Memory bandwidth (16 + 32/32): 16877 MB/sec
Memory bandwidth (32 + 32/32): 16887 MB/sec
GPU fill rate, single-texture (16/0): 2304 Mtexels/sec
GPU fill rate, multi-texture (16/0): 2445 Mtexels/sec
3,) run2
Memory bandwidth (0 + 32/32): 18404 MB/sec
Memory bandwidth (16 + 32/32): 16880 MB/sec
Memory bandwidth (32 + 32/32): 16874 MB/sec
GPU fill rate, single-texture (16/0): 2304 Mtexels/sec
GPU fill rate, multi-texture (16/0): 2446 Mtexels/sec
post #48 of 100
achm3t, where are the dma transfers in your test?
post #49 of 100
I think they are covered by the "Memory Bandwidth Test", because its very unlikely that those are programmed IO. However, I am not totally sure, whether these tests are good to measure the impact at all. I was just curious and followed friendship7's recommendation of valuable benchmarks.

According to friendship7, the performance impact is of theoretical nature for "system drivers that use DMA and are capable of handling 64bit addresses".
I do not know, whether (especially the second) is the case for nvidia-GPU or any XP-drivers at all.

However, since GPU is where speed does really matter (high throughput to GPU compared to other devices), I tend to say if this is working fine, I do not expect any problems from other drivers from a performance point of view.

This was just a quick-test without a deep understanding - interpret the results with care and feel free to propose some other benchmark if you think they are not suitable.
Edited by achm3t - 3/19/14 at 12:14pm
post #50 of 100
Addressing the gpu is actually about memory mapped IO. That is not dma. And a memory bandwidth test is just the cpu accessing the ram directly.

A dma transfer is when some device accesses the ram on its own. And just because you have more than 4GB doesn't necessarily mean your drivers (windows) are doing buffered dma transfers - they only occur when you have an allocated buffer above 4G and the driver/device cannot handle it.
Edited by larsch - 3/20/14 at 10:21am
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Windows
Overclock.net › Forums › Software, Programming and Coding › Operating Systems › Windows › Windows XP Ram Limit