Overclock.net › Forums › Industry News › Hardware News › [Intel] Intel Discloses New Architecture Features of Next Generation Itanium...
New Posts  All Forums:Forum Nav:

[Intel] Intel Discloses New Architecture Features of Next Generation Itanium... - Page 3

post #21 of 34
Quote:
Originally Posted by Scrappy View Post
I thought Itanium was intended originally as a desktop replacement due to x86 having so many under used commands, just never really caught on.
This is partly correct the Itainium I believe has a IA-64 RISC based architecture (Reduced Instruction Set Computing).
Where as the conventional x86 architectures are CISC based (Complex Instruction Set Computing).

EDIT:

here is a good read for people who don't know the differences and pros/cons
Edited by cdesewell - 8/23/11 at 7:09am
Troublechild
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD Phenom II 955 X4 ASUS M4A88TD-M EVO/USB3 MSI ATI 5670 1GB 4GB DDR3 1333 (2x2GB) 
Hard DriveOptical DriveOSMonitor
Western Digital Caviar 640 GB 7200 RPM SATA3 DVD+RW Windows 7 Ultimate 64 bit/Debian 86_64 Samsung 20" 
KeyboardPowerCaseMouse
Logitech wireless Arctic 500 Stock Logitech wireless 
  hide details  
Reply
Troublechild
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD Phenom II 955 X4 ASUS M4A88TD-M EVO/USB3 MSI ATI 5670 1GB 4GB DDR3 1333 (2x2GB) 
Hard DriveOptical DriveOSMonitor
Western Digital Caviar 640 GB 7200 RPM SATA3 DVD+RW Windows 7 Ultimate 64 bit/Debian 86_64 Samsung 20" 
KeyboardPowerCaseMouse
Logitech wireless Arctic 500 Stock Logitech wireless 
  hide details  
Reply
post #22 of 34
I remember alot of people had stopped supporting Itanium just because of its incompatibility. I don't remember Intel saying it was going to drop the architecture.
post #23 of 34
Quote:
Originally Posted by Zero4549 View Post
itaniums are vastly more reliable than even xeons, but then you already knew that
Where does it say Itaniums are "vastly more reliable" than Xeons?

"Xeon's reliability and performance is now equal —and in some cases better than—Itanium"
-Kirk Skaugen, Intel VP, Intel Architecture Group, April 2011
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #24 of 34
Quote:
Originally Posted by Zero4549 View Post
itaniums are vastly more reliable than even xeons, but then you already knew that
Having worked with both, Itaniums are not any more reliable than xeons. The failure rate was pretty even.

I don't know anyone running windows on Itaniums either. Not that you couldn't but there really is no reason. This is a chip meant for the HPC and high end database market, not your normal data server center. Oracle has even announced they will drop support for this chip.
Gaming rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
C2D E8600 ASUS Striker Extreme eVga 8800 gts 640 OCZ Platinum 4gb 
Hard DriveOptical DriveOSMonitor
WD2500JS NEC DVD+-RW Windows XP SP3 2x LG L226WTX 22" widescreen 
KeyboardPowerCaseMouse
Ideazon Merc Stealth Ultra Atx 600 Cooler Master CM690 Logitech Mx 518 
  hide details  
Reply
Gaming rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
C2D E8600 ASUS Striker Extreme eVga 8800 gts 640 OCZ Platinum 4gb 
Hard DriveOptical DriveOSMonitor
WD2500JS NEC DVD+-RW Windows XP SP3 2x LG L226WTX 22" widescreen 
KeyboardPowerCaseMouse
Ideazon Merc Stealth Ultra Atx 600 Cooler Master CM690 Logitech Mx 518 
  hide details  
Reply
post #25 of 34
I want one. What I would do with it hmm who knows.
VICE
(19 items)
 
  
CPUMotherboardGraphicsRAM
AMD Ryzen 5 1600 ASUS ROG STRIX B350-F GAMING ASUS ROG Strix RX 560 4g OC CORSAIR Vengeance LPX 16GB DDR4-3000 
Hard DriveHard DriveHard DriveCooling
Corsair Force MP500 Series M.2 SSD 240GB  256gb Kingston HyperX Savage SSD 240gb Sandisk Extreme SSD  Barrow LTYK3A-04 
CoolingCoolingCoolingOS
EK 480mm Radiator 2x XSPC Photon 170 D5 Pump/Res Combos 360mm Swiftech Radiator  Windows 10 64-Bit 
MonitorKeyboardPowerCase
Dell UltraSharp U2412M Custom WASD mechanical Evga 750w SuperNOVA G2 CaseLabs Merlin SM8 
MouseMouse PadAudio
CM Storm Xornet ROG Asus SupremeFX Realtek 1220 
  hide details  
Reply
VICE
(19 items)
 
  
CPUMotherboardGraphicsRAM
AMD Ryzen 5 1600 ASUS ROG STRIX B350-F GAMING ASUS ROG Strix RX 560 4g OC CORSAIR Vengeance LPX 16GB DDR4-3000 
Hard DriveHard DriveHard DriveCooling
Corsair Force MP500 Series M.2 SSD 240GB  256gb Kingston HyperX Savage SSD 240gb Sandisk Extreme SSD  Barrow LTYK3A-04 
CoolingCoolingCoolingOS
EK 480mm Radiator 2x XSPC Photon 170 D5 Pump/Res Combos 360mm Swiftech Radiator  Windows 10 64-Bit 
MonitorKeyboardPowerCase
Dell UltraSharp U2412M Custom WASD mechanical Evga 750w SuperNOVA G2 CaseLabs Merlin SM8 
MouseMouse PadAudio
CM Storm Xornet ROG Asus SupremeFX Realtek 1220 
  hide details  
Reply
post #26 of 34
Quote:
Originally Posted by cdesewell View Post
This is partly correct the Itainium I believe has a IA-64 RISC based architecture (Reduced Instruction Set Computing).
Where as the conventional x86 architectures are CISC based (Complex Instruction Set Computing).

EDIT:

here is a good read for people who don't know the differences and pros/cons
Itanium uses the EPIC architecture (epic fail?). EPIC is a type of VLIW not RISC or CISC. The only other VLIW chips I am familiar with are the ATI Radeon series (9xxx - HD6xxx with 7xxx supposedly new non-VLIW architecture).

EPIC offers great possibility for increasing performance/mm^2, but its compiler-dependent nature makes these performance gains disappear (due to inefficient programming of compilers and branch prediction problems). The new Itanium architecture has taken on a lot of RISC and CISC ideas and has somewhat depreciated the VLIW architecture components (esp explicit branching) in favor of other methods (esp real-time branching).

Quote:
Originally Posted by DuckieHo View Post
Where does it say Itaniums are "vastly more reliable" than Xeons?

"Xeon's reliability and performance is now equal —and in some cases better than—Itanium"
-Kirk Skaugen, Intel VP, Intel Architecture Group, April 2011
Based on what I have read, it seems that Itanium will leapfrog Xeon again. A server only architecture can do things that a workstation/server/home user/HPC architecture cannot.

realworldtech did (5-18-2011) the best job I've seen in overviewing the changes made in Poulson. What follows are small excerpts from a fascinating, but lengthy article. The rest is certainly worth the read.

Quote:
Itanium was originally conceived in the early 1990’s by the architects and engineers who had worked on HP’s PA-RISC. Many of them were convinced that dynamic instruction scheduling and out-of-order execution would ultimately prove to be too complex and power hungry. They believed that single threaded performance would not scale in the future. It is certainly true that many of the circuits in out-of-order designs can be power hungry - the re-order buffer, schedulers and renaming logic are fairly complicated and do not scale well to very large sizes. Instead of relying on extensive scheduling and renaming logic, the architects from HP and Intel took a different approach – embracing a VLIW (Very Long Instruction Word) philosophy. Itanium pushed the instruction scheduling burden onto the compiler and designed a number of ISA features that would assist software scheduling. The hardware was intended to be extremely simple with totally static scheduling. In theory, removing all the complicated scheduling and out-of-order logic would reduce power and scale better to smaller process nodes.

However, these gloomy predictions about out-of-order execution were not entirely accurate. The scheduling windows of modern CPU cores like Bulldozer or Sandy Bridge are 3-4X larger than aggressive x86 designs like the Pentium Pro (40 entry ROB) and larger still than the K5 (16 entry). The execution width of out-of-order designs has grown more slowly. Early microarchitectures were 2 and 3-issue wide, and have grown to 4-issue, but each uop in a modern core is much more powerful than before. Considering these factors, the execution width has probably grown by a factor of 2 – and more if a workload can be vectorized. In terms of single threaded performance, dynamic scheduling and out-of-order designs have significantly improved over the last decade, contrary to expectations from the early Itanium architects.

Poulson is a radical departure from the initial Itanium philosophy, and takes into account years of experience, and technology and market changes. Poulson abandons the idea of simple hardware controlled by the compiler and is the first dynamically scheduled Itanium design, with modest out-of-order execution. The microarchitecture was rebalanced to favor server workloads, rather than HPC and workstations. Poulson has a more sophisticated multi-threading and multi-core architecture, recognizing the need for tolerating memory latency and technical changes in the industry that have occurred since the first Itaniums debuted on 180nm in 2000. For all the changes though, some things remain the same. Poulson focuses on wide execution and instructions-per-cycle (IPC) rather than frequency, and has excellent reliability features. The die size is a substantial 544mm2 for massive on-die caches and scalability features for large servers.

Poulson has already taped out, which is a requirement for ISSCC papers. But products are slated for release in 2012 (most likely in the first half), reflecting the extremely long test and validation process for mission critical systems...

The changes to Poulson’s microarchitecture are comprehensive and encompass every part of the pipeline, but instruction fetch is perhaps the least impacted. Fine grained multi-threading is the biggest change for the fetch part of the front-end. Previously, fetching was essentially single threaded, while for Poulson, it must be shared between two threads dynamically. In all likelihood, the two threads alternate cycles based on priority counters with the goal of keeping the further parts of the pipeline full.

The Itanium instruction set was influenced by the RISC philosophy, with an emphasis on simple instructions and relying on the compiler for complex operations. The ISA is a strict load-store model and specifically designed to avoid any complex instructions that would have to be decoded into multiple uops – unlike x86, zArch and even the ostensibly simple Power and ARM. Itanium also has no microcode, and instead stole a page from Alpha. The firmware uses a Processor and System Abstraction Layers (PAL/SAL) to create a standard software interface to the outside world and handle tasks like booting, power management and machine check error handling. Lack of virtualization was an oversight in the original ISA, but it was later added through hardware and PAL code.

Decoding takes two stages and is where Poulson begins to significantly deviate from Tukwila and resemble a more conventional in-order pipeline. Rather than preserve Itanium’s VLIW semantics, Poulson actually breaks bundles apart into constituent instructions. These individual instructions, instead of bundles, form the basis of further execution.

Tukwila and all earlier Itanium designs were VLIW microarchitectures; compiled bundles formed the basis of execution and instructions were statically scheduled. Any dependencies were resolved by global stalls. The global stall microarchitecture would halt the entire pipeline until the problem had been resolved.

Poulson is fundamentally different and much more akin to traditional RISC or CISC microprocessors. Instructions, rather than explicitly parallel bundles, are dynamically scheduled and executed. Dependencies are resolved by flushing bad results and replaying instructions; no more global stalls. There is even a minimal degree of out-of-order execution – a profound repudiation of some of the underlying assumptions behind Itanium.

Poulson has 3 branch units, 2 simple ALUs, 2 integer units, 2 FPUs and 2 memory pipelines. Tukwila had 3 branch units and 2 FPUs, but no pipelines for simple ALU instructions, which could execute on any of the 4 memory pipelines or 2 integer units. While Poulson’s FPU latency is unknown, most integer operations are single cycle latency for dependent integer operations. In addition, there is a new 4-cycle, 64-bit integer multiplier on at least one of the two integer pipelines, used for both multiply and multiply-add instructions.

Tukwila has an incredible 4 load/store pipelines tightly integrated with the cache and TLB hierarchy to achieve low latency and high bandwidth. The L1D cache and L1 DTLB are only used for integer load instructions, while all stores and floating point loads rely on the L2 D-cache. This is a great example of microarchitecture and circuit co-design with impressive results. The overall cache system is quad-ported, with single cycle latency for integer loads and high bandwidth for floating point data accesses. Only the first two of the memory pipelines can access the L1 D-cache, although they can also issue FP loads to the L2D. The second set of memory pipelines are specialized for integer stores and any FP memory accesses; they generally interface with the L2D.

Poulson’s cache hierarchy was glossed over at ISSCC and remains somewhat of a mystery...

From its conception, the goal of Itanium was to address the entire server and workstation market – from HPC to mainframes. In contrast, notebooks and desktops make up the overwhelming majority of x86 microprocessors from AMD and Intel. While x86 designs have grown up and can tackle most of the workloads meant for Itanium, they have stayed true to their roots. There is a limit to how much additional hardware Intel and AMD can put into mainstream x86 designs, without compromising the volume economics. The system architecture for Itanium has a much greater focus on system scalability and reliability. As Figure 7 shows, both Tukwila and Poulson have more QPI links than Westmere-EX for scalability.

Poulson is socket compatible with Tukwila and relies on a similar system architecture. Both processors use a variant of the QuickPath Interconnect found in x86 designs, which is tuned for scalability and reliability. All x86 microprocessors rely on snoop-based cache coherency; whenever a core misses in the last level cache and reads from memory, it must also send a request to the caches in all other sockets to check for copies of the cache line. Snooping is very low latency for 1-4 sockets, but is inefficient for larger systems.

In contrast, Tukwila and Poulson have a directory-based coherency protocol that scales much better. For every cache line, the directory lists which cores have a copy. When a memory access misses in the L3, it first checks the directory to determine which other cores have the cache line and whether it should get the data from memory or another cache. Either way only a single request and response are sent, compared to N requests and N responses in a snooping system. Checking the directory adds a small bit of latency, but for 4 or 16 socket system, the bandwidth savings are huge. To accelerate the whole process, Tukwila and Poulson also include specialized caches for the directory.

Poulson's performance has not been discussed, but there are enough clues to put together some intelligent estimates. Given the scope of the changes, performance per core could improve by 25-40%, through a combination of higher frequency and IPC. On top of that, the core count has doubled, so the net gain could be as high as 2.8X. For workloads that are memory and I/O bandwidth limited, the gains will be substantially smaller, but still significant.

Poulson's microarchitecture (Figure 8) should increase instructions per cycle by 10-15%. Dynamic scheduling will boost IPC, although to a lesser extent than full blown out-of-order execution; and removing the NOPs is also fairly helpful. The 12-wide back-end can swiftly clear all the stalled instructions when a cache miss is resolved; helping average IPC, even if the core is only 6-wide due to fetch and decode constraints. Poulson's better multithreading and replicated DTLBs will raise utilization of the execution pipelines and data caches significantly and help hide low latency events (e.g. L1 or L2 cache misses). The only loss of IPC in the core should come from scaling back to 2 memory pipelines - but for most software, this is a small factor.


Edited by hajile - 8/23/11 at 8:40am
post #27 of 34
So what software will run on Itanium? Microsoft, Oracle, and Red Hat have already dropped Itanium support.

What good is hardware that no software provider will support?
post #28 of 34
Quote:
Originally Posted by hajile View Post
Based on what I have read, it seems that Itanium will leapfrog Xeon again. A server only architecture can do things that a workstation/server/home user/HPC architecture cannot.
I was questioning "more reliability" statements, not performance.

Quote:
Originally Posted by Riou View Post
So what software will run on Itanium? Microsoft, Oracle, and Red Hat have already dropped Itanium support.

What good is hardware that no software provider will support?
In-house custom software.
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #29 of 34
Quote:
Originally Posted by Riou View Post
So what software will run on Itanium? Microsoft, Oracle, and Red Hat have already dropped Itanium support.

What good is hardware that no software provider will support?
The companies that spend millions on this kind of hardware are often large enough to handle support internally.

edit: beat by duckie

Quote:
Originally Posted by DuckieHo View Post
I was questioning "more reliability" statements, not performance.
In-house custom software.
if reliability is approximately equal and performance is much higher, there is still a market. Also worth noting is that if the reliability of a 700mm^2 chip (current Itanium) is equal to a 260mm^2 chip (current Nehlem), when the size of the Itanium drops to 540mm^2 (poulson) the reliability will increase due to smaller die size and thus (likely) provide more reliability.
Edited by hajile - 8/23/11 at 9:03am
post #30 of 34
Quote:
Originally Posted by Riou View Post
So what software will run on Itanium? Microsoft, Oracle, and Red Hat have already dropped Itanium support.

What good is hardware that no software provider will support?
bespoke software
Troublechild
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD Phenom II 955 X4 ASUS M4A88TD-M EVO/USB3 MSI ATI 5670 1GB 4GB DDR3 1333 (2x2GB) 
Hard DriveOptical DriveOSMonitor
Western Digital Caviar 640 GB 7200 RPM SATA3 DVD+RW Windows 7 Ultimate 64 bit/Debian 86_64 Samsung 20" 
KeyboardPowerCaseMouse
Logitech wireless Arctic 500 Stock Logitech wireless 
  hide details  
Reply
Troublechild
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD Phenom II 955 X4 ASUS M4A88TD-M EVO/USB3 MSI ATI 5670 1GB 4GB DDR3 1333 (2x2GB) 
Hard DriveOptical DriveOSMonitor
Western Digital Caviar 640 GB 7200 RPM SATA3 DVD+RW Windows 7 Ultimate 64 bit/Debian 86_64 Samsung 20" 
KeyboardPowerCaseMouse
Logitech wireless Arctic 500 Stock Logitech wireless 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [Intel] Intel Discloses New Architecture Features of Next Generation Itanium...