Piledriver & Late Piledriver Changes:
Piledriver has no L3
Support for FMA, F16C, BMI and TBM instruction sets for Piledriver and Late Piledriver
Increased L1 DTLB size to 64 for Piledriver and Late Piledriver
Support for 10 cores per node in some products for Late Piledriver
Four DDR3 channels for Late Piledriver
AGLUs now do ALU functions
this is new information
Directly from AMD
Using IOMMUv2 features
Some AMD Family 15h processors (example: model 10h-1Fh) include an enhanced IOMMU that
controls I/O device access to system memory. The IOMMU provides support for address translation
and access protection on DMA transfers by peripheral devices.
• Remaps addresses above 4GB for devices that do not support 64-bit addressing
• Allows a guest OS running under a VMM to have direct control of a device
• Provides page granularity control of device access to system memory
• Allows a device direct access to user space I/O
• Filters and remaps interrupts.
Refer to IOMMU Architectural Specification, issue # 34434 revision 2 for more information
regarding detection of IOMMU features and other programming information.
IOMMUv2 also enables a key I/O device memory optimization. With IOMMUv2, I/O devices have
direct access to driver pinned memory. Prior to this, I/O devices could only see restricted portions of
memory on a local CPU. The assignment of this memory was traditionally done through GART
mappings. The IOMMUv2 removes this restriction, enabling any application to share data directly
with I/O devices. This removes the need for extra memory copy operations to move data into pinned
The IOMMUv2 provides a new capability to access guest virtual (user) space. This requires a new
generation of compatible I/O devices that support the PASID TLP prefix. These attributes combined
with the ability to support users space address translation for unpinned memory facilitates saving
extra copies, making IOMMUv2 more efficient.
These new capabilities do not directly change anything in the NUMA architecture. The NUMA I/O
optimizations mentioned earlier in this section still apply in the case of device buffer memory
mapping using the IOMMU. For instance, although the device can directly access memory on other
NUMA nodes, best performance will result when memory is accessed on the local node. Also any
interrupt service routines should still be locked to the local node.
Each node in the AMD family 15h
system consists of four compute units attached to an integrated memory controller and up to four
HyperTransport™ links. Models 20h – 2Fh consist of up to five compute units and models 10h – 1Fh
consist of one to two compute units.
http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdfEdited by Seronx - 1/10/12 at 1:52am
The memory controller in models 00h–0fh has two channels to DDR3 memory. The memory
controller in models 10h–1fh has two channels to DDR3 memory. The memory controller in models
20h–2fh has four channels to DDR3 memory.