Updated 11:00am PT: Corrected the article to reflect that the tests were conducted with a single V100 GPU.
Computer scientists from Rice University, in collaboration with Intel Labs, have announced a breakthrough new deep learning algorithm – called SLIDE – that trains AI models faster on CPUs than traditional algorithms on GPUs. For some types of computation, this effectively moves the performance crown of fastest chip for training to CPUs.
In particular, the researchers benchmarked a system with 44 “Xeon-class cores” against a $100,000 system with eight Nvida Volta V100 GPUs with tensor cores, although they only used one V100 for the tests. The Xeon system completed the task in one hour using SLIDE, compared to 3.5 hours for a single Volta V100 with a TensorFlow implementation. The researchers also noted that the algorithm may be further optimized as it competes against a mature (software and hardware) platform. For example, it did not yet use Intel's DLBoost acceleration.
Processors are going to be in much more demand than they already are.
Neat. Wonder if AMD has a similar competing technology.
This algorithm should also work on AMD's processors since they stated in the article they aren't using DLBoost yet. ASIC is great for inferencing but until this breakthrough GPUs were the standard for training.
The researchers say that there are further performance improvements left as they have “just scratched the surface”. To that end, they say that they have not used vectorization – such as AVX SIMD instructions – including Intel’s DLBoost acceleration and claimed “there are a lot of other tricks we could still use to make this even faster.”
Interesting and really cool to see. I wonder what the power consumption difference is though. As a guess, I would say the 44 core intel chip is pulling twice as much power as the Tesla is but who knows.