Originally Posted by DuckieHo
Not really.... processes that use FP64 on the CPU are not that common on mobile.
Most ARM processors use NEON which is a 128-bit floating-point SIMD unit. While a lot of programs don't directly use NEON, they call libraries using NEON more than one might think.
Originally Posted by Gungnir
From the looks of it, Apple will be the first company with a 64-bit ARM chip, and from the sound of it, I'm doubting it's just A57, so they had to design/modify their own architecture and chip to do this. That's not exactly "without any trouble"... Also, considering how many people agree that the iP5 feels smoother/better than current flagship Android phones, and the fact that (according to Apple, at least) the 5s is twice as fast as the 5, I doubt they're having much trouble with staying relevant.
Also, 64 bit tends to be easier to program for than huge numbers of threads in common mobile tasks *cough*quadandoctacoresmartphones*cough*
Apple devices feel smoother primarily because they don't have extra code to deal with all the edge cases that arise from multiple screen/device types, don't rely on java and don't have garbage collection (and have a little better graphics subsystem overall too). ARM only claims a 50% improvement from A15 to A57. I seriously doubt Apple is being above board with their performance improvements. 64-bit support mostly comes down to a compiler flag. Lastly, programmers (of all but the most trivial apps) should be threading their programs whether the device has 2 cores or 8 cores.
Originally Posted by RAND0M1ZER
Except you just said so yourself that Android has been announced as 64-bit and I have yet to see anyone "going nuts like it was the most perfectly awesome amazing achievement ever".
The switch to 64-bit is inevitable, I just don't see why this is an important feature to advertise to consumers. Maybe Apple feels it would be advantageous to be the first to release a 64-bit mobile phone for publicity reasons but the reality is that there won't be any performance improvements from going 64-bit. In fact there is more overhead from using 64-bit over 32-bit due to the extra length of each instruction, which is curious since Apple devices usually have less memory and processing power than their Android equivalents.
I also thought it was interesting to see Apple advertise something so technical like this which most consumers do not understand which is certainly a change.
It's not more useful than advertising the additional dev APIs or talk about how much more powerful the GPU is (that really only matters to the developers since they're the ones who decide whether or not to use the extra power). Marketing works with whatever they're given whether it's actually useful to a consumer or not (though like that GPU, 64-bit is useful even if the customer doesn't realize).
A note on ARM64 instruction size. The instructions are still 32-bits long, so program size doesn't increase (in fact it decreases by a couple percent due to fewer context switches and fewer IO operations). Even after relabelling 3 bits for use with register ID, they still have room for millions of instructions. For some quick information on ARM64, look at these slides (slide 6 mentions the 32-bit instruction length)
Originally Posted by Seronx
x86 assembly to x86-64 assembly saw a 30% perf improvement. The user might not feel any difference but benchmarks will.
x86 saw that huge performance improvement for reasons that don't apply so much to ARM. First (and most impactful) is that x86 went from 8 to 16 general-purpose (GP) registers. Having the extra registers allows compilers to optimize a few things. The most important is IO. Let's say you have a function that uses 10 labelled variables. The compiler must write code to load the first 5 or so that are needed. The other 3 registers are needed for temporary results (for example 'x = 5 * z + 12 / 16' requires you to load 5, z, 12, and 16 then multiply 5 and z and store the contents, then divide 12 by 16 and store the contents and then add those two contents together before assigning them to the register that will now hold the value of x). Once you account for these other registers for non-variable data and the registers for temporary variables, your total number of slots needed probably goes up to 25 or so.
This means you must pull 5 vars from cache into the registers, then add some instructions to store those values in a stack and pull some more values from cache then put some of these in a stack so you can get back one of the numbers you stored the first time so you can...
Adding those extra 8 registers got rid of more than half of these load/store operations providing for a dramatic increase in performance. Going from 16 to 32 registers (like ARM did) provides a more moderate increase (though still a bigger increase than moving x86 from 16 to 32 would be as some of those ARM registers are not used for general purpose storage).
Second, when in 64-bit mode, x86 drops a lot of it's old baggage allowing it to perform a lot of operations much more quickly and efficiently.Edited by hajile - 9/12/13 at 11:12am