Overclock.net › Forums › Industry News › Software News › [PCW] Multicore Myths Revealed
New Posts  All Forums:Forum Nav:

[PCW] Multicore Myths Revealed

post #1 of 47
Thread Starter 
More than charity lies behind Microsoft and Intel's announcement this week that they will donate US$20 million to a pair of U.S. colleges in the hope of spurring advances in parallel, or multicore, programming research, as a Microsoft research scientist readily acknowledged.
"There is a worldwide shortage of people experienced in parallel computing experience, for sure," said Dan Reed, director of scalable and multicore computing at Microsoft. "One of the collateral reasons is to raise awareness in the academic community, because that's where the next generation of developers will come from."
While for years, ever-higher clock speeds almost guaranteed that application code would run faster and faster, the rules are different for the multicore processors of today.
The difference has been compared to a sports car and a school bus. While the first is capable of blazing speed, the other moves more slowly but can move far more people at once.
The problem is, simply adding more cores to a computer's CPU doesn't increase the speed or power of conventional application code, as a recent Forrester Research report notes.
"To gain performance from quad-core processors and prepare for the denser multicore CPUs that will follow, application developers need to write code that can automatically fork multiple simultaneous threads of execution (multithreading) as well as manage thread assignments, synchronize parallel work, and manage shared data to prevent concurrency issues associated with multithreaded code," the authors wrote.
In other words, complex work is required to fill all those seats on the bus.
And the quad-core processors common today will soon give way to radically more advanced designs, Forrester notes. "Expect x86 servers with as many as 64 processor cores in 2009 and desktops with that many by 2012."

The situation has had chip makers and major software vendors making broad-based efforts to raise awareness of both the promise and challenges of programming for multiple cores.
TopCoder, a software development company that invites its membership to work on various aspects of a project through competitions, just began a series of special contests, along with chipmaker AMD, that focuses on multithreading.
Mike Lydon, TopCoder's chief technology officer, said multicore programming remains the province of an elite few. "What we've seen from the skill set perspective is, it varies quite a bit," he said. "As you would expect, the high-end developers are familiar with threading. After that it drops off pretty quickly."
"It's surprising to me because multithreading programming isn't new," he added. Indeed, one instructional article available on a Microsoft's MSDN Web site dates to 1993.
"I think it stems primarily from the collegiate level," Lydon said. "I've heard very little about colleges teaching multithreaded programming, but I would think and hope that it's changing very quickly."
However, Forrester's report suggests the urgency isn't being felt across the board. It notes that major operating systems and most middleware products are already prepared for multithreaded operation and for "near term" multicore processors, and that corporate development shops may look to ISVs (independent software vendors) to solve the problem through development tools and platforms that can better handle multicore-related tasks.
But Microsoft's Reed believes that multithreading over time will become "part of the skill set of every professional software developer."
In the meantime, most of the parallel computing resources available now don't necessarily hide the complexity of coding for multiple threads. "Development pros have options today, but most of them are low-level language extensions and libraries," Forrester said.
For example, in February AMD open-sourced more than 3,200 software routines under a project called Framewave, which it said will help coders build multithreaded applications for x86-type processors.
"Libraries can't provide a complete answer, but we see these as iterative steps," said Margaret Lewis, director of commercial solutions and software strategy at AMD. "There's things that you can do today as you're waiting for those [more advanced] tools that can increase the multi-threadedness of your applications," she said.
There are some higher-level products already on the market, such as the platform sold by RapidMind, which takes single-threaded C++ code and then, through an abstraction layer, "parallelizes" it across a number of cores.
However, it would be "fairly idealistic" to think that better tools alone will be enough, Lydon argued. "When you actually get into the points in code where you're going to leverage performance by spawning multiple threads, it takes a human mind to see where the benefits could take place."


http://www.pcworld.com/article/id,14...1/article.html

***


[EETimes] Salesmen dont tell you it's more than you need....

The job is left to advertisers to tell you there is more you don't have.

WARNING: This Situation Has Not Changed

The tools needed to program and debug multicore ICs are in the "dark ages," according to a keynoter at last week's Multicore Expo here. Solutions are emerging, but the dearth of parallel-programming tools and lack of expertise among embedded designers threaten to slow the progress of multicore architectures.
Although designs studded with two, four or even eight processor cores on the same die are fast becoming commonplace in embedded applications, tool support for multicore programming is about where VLSI design tools were in the 1980s--"still in the dark ages," said Anant Agarwal, professor of electrical engineering and computer science at the Massachusetts Institute of Technology.
In his keynote here, Agarwal called for new tools, standards and ecosystems. "Who will be the Microsoft, the Cadence or the Synopsys of multicore?" he asked.
Driven by performance and power concerns, multicore ICs are fast finding favor among designers--so much so that observers warn that in a few years, multicore ICs will have hundreds of cores. Meanwhile, programmers are struggling to cope with today's designs.
"Multicore is hard," said Tomas Evensen, CTO of Wind River Systems. "There are ways to make it easer, but there's a lot of history around sequential programming that makes it hard to move to multicore. A lot of code is written in a single-threaded way, and people don't want to start from scratch and rewrite."
Multicore architectures involve multiprocessing, and to take advantage of that, parallel programming is needed. But few embedded designers have the expertise. "Parallel programming was hot 15 years ago in academic circles, and then it wasn't," said Michael McCool, chief scientist at RapidMind Inc. "There's a whole generation of programmers who don't know how to program in parallel. All programmers will have to become parallel programmers, and quickly, because all programs will be parallel."
McCool noted, however, that "compilers do a terrible job extracting parallelism." Multicore debugging is also challenging, because programmers must track interactions between cores and ferret out deadlocks, data races, memory corruption and stalls. Different processors typically come with their own debugging environments, making it tough to get one view of what's going on in the system.
Solutions, however, are emerging. New and existing companies at the Multicore Expo presented compilers, software development platforms, analysis tools and debugging architectures that claim to ease--though not fully automate--the transition to multicore application development. New multicore development capabilities will also be shown at this week's Embedded Systems Conference (see related story, page 42).
Various multicore architectures pose different programming and debugging challenges. Homogeneous multicore ICs, such as the ARM11 MPCore, use very similar or identical processor cores. Heterogeneous multicore architectures, like Texas Instruments Inc.'s OMAP, use different types of processors.
Some homogeneous multicore ICs use symmetric multiprocessing (SMP), in which there's shared memory and a single operating system that automatically assigns processes to different cores. With asymmetric multiprocessing (AMP), the user manually assigns tasks.
Heterogeneous multicore ICs raise a raft of programming challenges, noted Greg Davis, technical lead for compilers at Green Hills Software. Different CPUs may require different compilers, dialects and pragmas, he said, and some have "flaky tools." Auxiliary cores may have limited memory banks and must interact with a master core to swap in memory.
SMP is an attractive programming model, because some existing prepartitioned code will "just run faster," Davis said. But SMP systems may exhibit nondeterminism, inefficiency and latent race conditions. AMP provides more user control over efficiency and determinism, but results in less portable software with higher up-front costs, he said.
Frank Schirrmeister, vice president of marketing for stealth-mode startup Imperas Inc., presented four "axes" for categorizing multicore systems: processors, communications, memory architectures and "specificity" for applications. All affect programming. For some types of designs, the big challenge is mapping tasks to the right processor; for others, it's run-time mapping to determine available compute space.
The shared-bus systems used for many multicore ICs are difficult to program and debug and prone to deadlocks and data races, Schirrmeister said. And the choice of memory architecture affects task execution times, he said.
Multiprocessing presents three major challenges, Schirrmeister said: partitioning, parallelization and optimization. What's needed, he said, is a programming model that makes it possible to create parallel applications, optimize the mapping of those applications onto parallel hardware and gather data to guide the optimization decisions.
Programming models
Providers are promoting varying approaches to multicore programming. For SMP systems, Posix threads and processes provide a way to add concurrency to programs, said David Kleidermacher, Green Hills Software CTO. He advocated "partition scheduling" at the application level rather than the thread level as a way of managing CPU execution time.
MIT's Agarwal said that Posix threads will do in the short term, but they offer no encapsulation or modularity. A more promising concept, he said, is one that's already used for ASIC design: streaming data from one compute unit to another.
Streaming is fast and efficient and is similar to the sockets used for networking applications, Agarwal said. A "socketlike" stream-based application programming interface could benefit multicore devices, Agarwal said, noting that the Multicore Association's proposed Communications API standard is such an interface.
RapidMind, which provides a software development platform for the IBM Cell Broadband Engine and Nvidia graphics processor, advocates a programming model called "single program, multiple data." It includes single-instruction, multiple-data concepts, but unlike SIMD, it doesn't assume a constant time per kernel.
"This model lets you think in parallel and express locality," said McCool. "It's deterministic and safe. You can't get deadlock or race conditions."
Regardless of the programming approach, multicore developers will need analysis and debugging tools. According to Wind River's Evensen, they will need hierarchical profiling tools to partition code and find bottlenecks. And run-time analysis tools will help identify race conditions that can occur when multiple threads have access to the same data.
Limited visibility makes multiprocessor debugging difficult, said Jakob Engblom, business development manager at Virtutech. Memory caches hide data, he said, and there's "time-sensitive, chaotic behavior" and a lack of determinism to contend with. Also, the system will keep running even if one core has stopped.
ARM's multicore debug solution is CoreSight, a technology that uses ARM's embedded trace macrocells. CoreSight includes a debug access port, an embedded cross-trigger mechanism, a "trace funnel" that converts multiple trace sources onto a single debug register bus, an embedded trace buffer and a trace port interface unit. Andrew Swaine, CoreSight team lead at ARM, said the technology is independent of the ARM architecture and is available for royalty-free use.
CoreSight was just one of a number of multicore solutions presented at the expo. Virtutech offers a "virtualized" software development environment that's said to ease multicore debugging. David Stewart, CEO of CriticalBlue, showed how his company's Cascade product can generate application-specific coprocessors from legacy software. Martijn de Lange, founder and CEO of Associated Compiler Experts, discussed multicore applications for his company's CoSy compiler generation system.
Hint: Try to find a list of the top ten multicore optimized applications.


http://www.eetimes.com/showArticle.j...leID=198701494
post #2 of 47
Thread Starter 
Some pervasive myths about running Java applications on multi-core systems are misleading developers, and it's time to shine the bright light of truth on these falsehoods.

f you're like many engineering-minded folks, chances are you've heard of the Discovery Channel show "Mythbusters." In a typical episode, the Mythbusters team takes a popular urban myth, such as the drop-a-penny-from-the-top-of-the-Empire-State-building-and-kill-somebody myth, and uses hard scientific experiments to prove it either true or false. It's a fascinating display of special effects wizardry and science. Plus, let's admit it, they love to blow things up and what hard-core geek doesn't like to see stuff get blown up?
Unfortunately, application developers have accepted several myths in our own industry. In particular, myths about running Java applications on multi-core systems are pretty pervasive amongst developers. While not as popular or ubiquitous as some of the ones debunked on Mythbusters, they're no less false or misleading.
We admittedly don't have the same kinds of experience in blowing stuff up, nor do we have a crash test dummy to subject to our experiments (poor Buster). However, we can still analyze some of the more common assumptions in the multi-core space today and show why they are, in fact, pure myths.
The "My App's Poor Performance Will Be Saved by Moore's Law" Myth
No other concept indirectly contributes more to poorly performing applications than Moore's Law (commonly cited as "processor speeds double every eighteen months"). To the developer who wants to focus on "getting it to work before getting it fast," Moore's Law is the magic that will make any application—no matter how slow on today's hardware—run twice as fast in just a year and a half. This is not only a common misrepresentation of Moore's Law, but as many developers are starting to notice, it doesn't hold true the way it once did.
Moore's Law actually states that the number of transistors that can be applied to a particular processor or chip is what doubles every eighteen months. For years, this indirectly led to the doubling of processor speeds, as any historical analysis of processor sales will display. But since 2001 or so, the steadily climbing speeds have suddenly flattened, and it doesn't take a lot of empirical evidence to see this. In August of that year, the average CPU speed in a high-end desktop or low-end server was around 2GHz, meaning that six years and four iterations of Moore's Law later we should be seeing processors at around 16GHz or so. A quick glance through online product descriptions from any major PC supplier will show that similar-ranged machines are hovering around the 3.0-3.6GHz range, more than a bit short of that 16GHz mark (click here for an example).
Interestingly enough, Moore's Law hasn't failed but the widespread perception is that it has. The number of transistors continues to double roughly every year and a half, but instead of trying to increase the raw speed of the processors themselves, the chip manufacturers are essentially "scaling out"—putting more CPU cores into the chip. In a sense, this means that developers are staring down the barrel of a multi-CPU machine every time they build an application. Even some of the lowest-end server hardware and many laptops are now multi-CPU systems.
So barring any major scientific breakthroughs in the next couple of years, developers can expect to see systems with multiple 2GHz processors rather than single CPU machines with 16, 32, or 64GHz processors. This means they will have to be deal with concurrency in order to deliver greater performance.
The "My App Server Will Scale My Java Code for Me" Myth
This popular myth emerged almost simultaneously with the release of the J2EE specification. Although it is true that Java app servers provide a degree of concurrency, many Java developers are under the mistaken impression that the app server will simply take care of all of their scalability needs. The attraction is undeniable: in this worldview, a developer need never think about hard problems like concurrency, transactional boundaries, or parallel processing. They can just think in terms of traditional Java applications and objects, and the app server will simply take whatever steps are necessary to ensure that everything just works and just scales. (For C++ developers, the lack of a standard app server means that there is even more work to do to ensure proper application concurrency.)
This is probably one of the most cherished viewpoints of the J2EE server space. As such, it is one of the most tenacious in the face of arguments to the contrary. Fortunately, it doesn't take a great deal of logical reasoning to see its inherent flaws. The belief begins with the basic assumption that when a particular J2EE application does not scale, it is due to the CPU not running fast enough. For most of today's applications, however, the CPU is not being kept busy. A large part of the application server execution time is spent transmitting data back and forth across components, taking out locks, or waiting for locks to be released. Spending some quality time with a profiler can show you just how much time your application spends on these tasks, which cannot benefit from a faster CPU.
This is an issue that an app server has little control over. In fact, an app server can sometimes even contribute to the problem if it tries to maintain shared state across multiple machines, as that state will need to be transferred back and forth across the various nodes in the cluster in order to maintain the illusion of zero hardware affinity.
Thus neither the CPU nor the app server can save the developer from the need to design concurrency into the application architecture.
The "My Cluster, Bus Operating System, or App Server Will Automatically Provide Ordered Control" Myth
At the heart of any clustering or web farm approach to scalability lies the belief that if you can't get better performance and scalability by running on faster hardware, you can at least get better scalability by running a bunch of machines in parallel and executing the code on all of them. This works only up to a point.
Amdahl's Law is less well known than Moore's Law but equally as important. It loosely states that the greater the percentage of sequential operations in a particular application or program, the less benefit can be derived from parallel execution. For example, even if only one percent of a particular program must be executed in sequence (which is an astoundingly small number that for most business applications is much greater), the maximum benefit that can be derived from parallelism is 100X. This number drops dramatically given even the smallest reduction in non-sequential code; the maximum speedup for code that is 90-percent parallel (meaning 10 percent of it must be executed sequentially) is 10X. Given even a number like 30-percent sequential code (which is still pretty good), then regardless of the number of processors thrown at the problem, the maximum speedup will be somewhere in the realm of 5X.

The implication of Amdahl's Law is that the diminishing returns of adding more processors accelerate as the percentage of sequential code grows (see Figure 1).

Figure 1. Speedup per Processor Is Limited by the Percent of Serialized Code
The top line shows that when there is only 0.1-percent serialization, speedup per processor is relatively consistent for large numbers of processors. However, when you look at the bottom line with 30-percent serialization, diminishing returns are quite obvious.

The thinking behind this myth is that the app server (or OS or some other piece of middleware) can automatically do some kind of analysis to discover the sequential operations, rearrange them in some fashion, and thus optimize the sequential operations in order to minimize the cost. But this implies that the application server could rewrite your application to reorder its execution to suit its own needs. Obviously this could have disastrous effects on the consistency and correctness of your code—something no application server vendor is going to risk. The app server must execute your code exactly as it sees it. And while it might be able to spin off certain threads in certain places to improve the parallelism, it will always be hamstrung by the basic requirement that it not break the order of execution of your code.
The "Concurrent Computing Has To Be Hard" Myth
Almost nothing scares a developer like the idea of explicitly dealing with the concurrency problems of multiple threads (or multiple machines), because concepts such as "deadlock," "livelock," "starvation," or "deadly embrace" are outside the core expertise of most application developers. And even if a skilled engineer knows how to handle these situations, it's not the ideal use of his or her time.
This has sparked resurgent interest in new ways to deal with concurrency, ranging from new language-based approaches (e.g., OpenMP or JR) to new approaches to languages altogether (e.g., functional languages like F# or Scala). While creating an entirely new language offers some unique opportunities to deal with concurrency in elegant ways, the thought of learning an entirely new programming language—much less a new platform—can be overwhelming to a developer.
Fortunately, building applications for concurrency doesn't have to be a task only rocket scientists, brain surgeons, and astrophysicists can perform. By understanding even some of the basic principles of how parallel processing best works (immutable objects rather than objects with mutable state and guarded code regions, making copies of shared state, and so on), developers can build applications to take advantage of multiple processors with relatively little work.
Of course, if one has a library or tool available to help ensure and enforce some of these design decisions, all the better. With the appropriate tools, a developer can not only mitigate the risks associated with rewriting multi-threaded code, but preserve the flexibility and agility of the application environment as well.
The Not-So-Mythical Java App on a Multi-Core System
Now that you can discern between myth and fact, taking advantage of the parallelism implicit in today's and tomorrow's multi-core CPUs doesn't have to mean subtle and irreproducible threading bugs, which consume thousands of man-hours only to disappear at debug time. Taking some straightforward steps to design more parallelism-friendly applications and using tools that help guide and enforce those design principles should make building a parallelism-friendly application as simple as building, say, an application that stores data into a relational database.

Oh, and if you're considering busting that Mentos-and-Diet-Coke-produce-spectacular-gushing-fountains myth, don't try it at home. Trust us; that one's a scientific fact. Hopefully this warning saves you from having to explain to your significant other why there's Diet Coke on the floor... and the walls... and the ceiling. Now, if you'll pardon us, we're curious to know if the metal-in-the-microwave-will-blow-it-up myth is true, and there happens to be an old microwave oven in the garage...

About the Authors
Cory Isaacson was most recently president of Rogue Wave Software. Cory has been actively involved in leading information technologies since 1985, starting as the founder and chief executive of Compuflex International. He has been providing guidance to IT professionals in the development and implementation of powerful business applications throughout his career.
Ted Neward is the principal at Neward & Associates, a consulting and mentoring firm that focuses on enterprise applications, big and small, in .NET and/or Java. He also teaches with Pluralsight, speaks at numerous conferences, and writes on a variety of subjects, including programming languages, virtual machine platforms, architecture, and security.


http://www.devx.com/Java/Article/35087/1954
post #3 of 47
Eyes...bleeding...so...much...pain
    
CPUMotherboardGraphicsRAM
Core 2 Duo E6400 MSI P6N 650i eVGA GeForce 8800GTX 2GB OCZ PC2-6400 
Hard DrivePower
74GB 10,000RPM SATA150 Raptor 650W Antec PowerTrio 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
Core 2 Duo E6400 MSI P6N 650i eVGA GeForce 8800GTX 2GB OCZ PC2-6400 
Hard DrivePower
74GB 10,000RPM SATA150 Raptor 650W Antec PowerTrio 
  hide details  
Reply
post #4 of 47
just so you know next time,just give us the link to the article.
Master Control
(12 items)
 
  
CPUMotherboardGraphicsRAM
AMD FX-6300 Six-Core  GIGABYTE GA-970A-DS3P AMD Radeon 480 Red Dragon  Team Vulcan  
RAMHard DriveOSMonitor
Team Vulcan  Muskin Reactor 2.5 SSD Windows 10 64bit Acer H233H 1920x1080 
KeyboardPowerMouseMouse Pad
Microsoft Pro PC Power & Cooling 750W Microsoft Pro old and busted 
  hide details  
Reply
Master Control
(12 items)
 
  
CPUMotherboardGraphicsRAM
AMD FX-6300 Six-Core  GIGABYTE GA-970A-DS3P AMD Radeon 480 Red Dragon  Team Vulcan  
RAMHard DriveOSMonitor
Team Vulcan  Muskin Reactor 2.5 SSD Windows 10 64bit Acer H233H 1920x1080 
KeyboardPowerMouseMouse Pad
Microsoft Pro PC Power & Cooling 750W Microsoft Pro old and busted 
  hide details  
Reply
post #5 of 47
Thread Starter 
Quote:
Originally Posted by GuardianOdin View Post
just so you know next time,just give us the link to the article.
or just a member who doesn't like to read?

if the link had what my post had in it,
I could just link
but it doesen't
post #6 of 47
So if I would buy a cpu tomorrow, which would be the smarter buy?
E8400 or Q6600?
post #7 of 47
you mean like this one?

http://www.devx.com/Java/Article/35087/1954
Please give credit to the sites you "borrow" from.
My System
(13 items)
 
  
CPUMotherboardGraphicsRAM
E8400 XFX 780i A2 SLI EVGA 8800GTS Core 92's Crucial Ballistix 6400's 2Gb 
Hard DriveOptical DriveOSMonitor
2x 500Gb 7200.10 LG 18x Super Drive Dual Boot XP SP# and Vista Ultimate 24inch 245BW Samsung 
KeyboardPowerCaseMouse
G15 Rev. 2 OCZ GameXtreme 850Watt CM 690 G5 
  hide details  
Reply
My System
(13 items)
 
  
CPUMotherboardGraphicsRAM
E8400 XFX 780i A2 SLI EVGA 8800GTS Core 92's Crucial Ballistix 6400's 2Gb 
Hard DriveOptical DriveOSMonitor
2x 500Gb 7200.10 LG 18x Super Drive Dual Boot XP SP# and Vista Ultimate 24inch 245BW Samsung 
KeyboardPowerCaseMouse
G15 Rev. 2 OCZ GameXtreme 850Watt CM 690 G5 
  hide details  
Reply
post #8 of 47
Quote:
Originally Posted by kpo6969 View Post
So if I would buy a cpu tomorrow, which would be the smarter buy?
E8400 or Q6600?
Depends on what your needs are... Video encoding demands quads.
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
Once again...
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 920 [4.28GHz, HT] Asus P6T + Broadcom NetXtreme II VisionTek HD5850 [900/1200] + Galaxy GT240 2x4GB G.Skill Ripjaw X [1632 MHz] 
Hard DriveOSMonitorKeyboard
Intel X25-M 160GB + 3xRAID0 500GB 7200.12 Window 7 Pro 64 Acer H243H + Samsung 226BW XARMOR-U9BL  
PowerCaseMouseMouse Pad
Antec Truepower New 750W Li Lian PC-V2100 [10x120mm fans] Logitech G9 X-Trac Pro 
  hide details  
Reply
post #9 of 47
Quote:
Originally Posted by DuckieHo View Post
Depends on what your needs are... Video encoding demands quads.
I would probably go with the 8400. I was curious what the OP thought.
post #10 of 47
Thread Starter 
Quote:
Originally Posted by kpo6969 View Post
So if I would buy a cpu tomorrow, which would be the smarter buy?
E8400 or Q6600?
Many forummers talk about subjects that need data to support their conclusions.

I rather not speculate, but the data suggests that 4 cores do not raise performance over two as much as the cost difference would imply.

That is for the data contained in my own posts.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Software News
Overclock.net › Forums › Industry News › Software News › [PCW] Multicore Myths Revealed