The Qualcomm Snapdragon S4 (Krait) Preview Part II
by Anand Lal Shimpi on February 22, 2012 11:40 PM EST- Posted in
- Smartphones
- Snapdragon
- Qualcomm
- Krait
- Mobile
- Tegra 3
- Tablets
- NVIDIA
Yesterday we presented the first results of Qualcomm's Krait based MSM8960 SoC. While we still await the first Krait based phones (widely expected to begin shipping sometime in Q2), courtesy of Qualcomm's MSM8960 Mobile Development Platform we were able to get a good idea of the upper bound for Krait and MSM8960 performance. I mention it's the upper bound because, at least in the past, MDP performance hasn't corresponded directly to shipping device performance. There was a pretty big delta between MSM8660 MDP performance and phones that used the MSM8660. Qualcomm tells us that this time around things are going to be different. Qualcomm is expecting a much narrower (nonexistent?) gap between the MSM8960 development platform and phones that use MSM8960 silicon. One major difference between the MSM8960 MDP and our earlier MSM8660 MDP was the state of the CPU governor. In the earlier MDP the governer was set to max performance, always delivering the CPU's maximum clock frequency. With the MSM8960 platform the governor was set to ondemand, allowing for variable CPU speeds depending on what the OS requests of the device. The ondemand setting is in-line with what we can expect device manufacturers to use when they ship phones. All of this goes to say that while we have a good handle of what Krait and the MSM8960 are capable of, there are still a lot of unknowns.
While it's true that shipping performance remains to be seen, some of the deltas we saw between MSM8960 and the current competition were so great that even a much slower implementation in a shipping phone would still be significantly faster than anything else out today.
We left our MSM8960 investigation with two major unknowns. The first was power consumption. We still haven't been able to get Qualcomm's Trepn tool running on the MSM8660 MDP, which has always been a bit finicky. To get a true feel for MSM8960 battery life we will have to wait for shipping devices. The other major unknown was really how MSM8960 stacks up against NVIDIA's Tegra 3.
Tegra 3 was everything Tegra 2 should have been. We got higher clocks, NEON support and a much faster GPU. The only thing missing from Tegra 3 was a dual channel memory interface. We were happy with Tegra 3 on ASUS' Eee Pad Transformer Prime, but in less than a week we'll get to meet some of the first smartphones based on T3 silicon.
Armed with the Eee Pad Transformer Prime (updated to Ice Cream Sandwich) we're able to get a rough idea of how these two heavyweights will compare. The same caveats that applied to the MDP apply to our Tegra 3 platform as well. Since we are using a tablet we're obviously dealing with a higher TDP than what you'll find in a phone. The comparison today is largely academic and naturally shipping devices may be better or worse that these two representatives. With the disclaimers out of the way, let's get to the comparison.
CPU Performance: Preferring Single vs. Multithreaded Performance
The MSM8960 features two Krait cores compared to the four ARM Cortex A9 cores in NVIDIA's Tegra 3. While the A9 is a very power efficient core, Krait offers a much wider front end, wider execution back end, faster FPU and an improved cache/memory interface. All of these factors together combined with similar clock speeds to what Tegra 3 is able to hit should result in better absolute performance in single or lightly threaded applications. As video decode and transcode are both fully offloaded in all modern SoCs, finding workloads that scale well across more than two cores is difficult. We noted this in our Eee Pad Transformer Prime review - it's just not easy coming up with current apps that scale well to four ARM cores. That's not to say that there are no advantages to more than two cores, but you're more likely to get a benefit from two faster cores vs. four slower ones.
NVIDIA's saving grace is the fact that it did ramp up A9 clock speed very high in Tegra 3, and it has that handy companion core 4-PLUS-1 architecture to keep power consumption low throughout very light workloads. There's also the fact that while very few smartphone apps will peg four cores constantly, there are periods of time when you'll see more than two cores in use. Multitasking, although more likely to happen in significant amounts on a tablet, can also increase usage of the third and fourth cores on Tegra 3.
We'll start with Linpack, our heaviest floating point/cache/memory bandwidth test:
Single threaded floating point performance is obviously a strength of the MSM8960 and Krait. Qualcomm tells us that Krait is able to multi-issue floating point instructions, something that the Cortex A9 cannot do. The MSM8960 memory controller also appears to be more efficient than previous designs, contributing to the magnitude of the win here.
Move to more threads and the situation doesn't change dramatically, although Tegra 3 is obviously far more competitive thanks to its sheer core count:
Javascript performance can be multithreaded at times but most of the benchmarks we run don't scale incredibly well beyond two cores. Making matters worse is the fact that SunSpider performance regressed on the Eee Pad Transformer with the latest update to ICS. I've included the old Honeycomb results as a reference for where things should be. Keep in mind that the Honeycomb browser on the Eee Pad Transformer was very heavily optimized for Tegra 3. It's possible that the same degree of optimizations just aren't present in the ICS version yet.
Browsermark tells a different story. Here the Tegra 3 based Transformer Prime is actually able to be slightly faster than the MSM8960. The margin of victory is small enough to be a wash, but the fact that NVIDIA is able to remain competitive is important.
Basemark OS echoes more of what we'd expect. In the overall score the MSM8960 is around 50% faster than the Tegra 3 based tablet. Even if the MSM8960 MDP is unrealistically fast for a Krait platform, it's likely that we'll still see a Krait advantage.
Basemark OS - System | ||||
HTC Rezound | Galaxy Nexus | ASUS Transformer Prime | MDP MSM8960 | |
System Overall Score | 658 | 538 | 602 | 907 |
Simple Java 1 | 298 loops/s | 210 loops/s | 240 loops/s | 375 loops/s |
Simple Java 2 | 7.28 loops/s | 8.61 loops/s | 7.27 loops/s | 10.8 loops/s |
SMP Test | 35.3 loops/s | 49.2 loops/s | 81.2 loops/s | 64.4 loops/s |
100K File (eMMC->SD) | 6.49 mB/s | 9.52 mB/s | 11.0 mB/s | 8.64 mB/s |
100K File (SD->eMMC) | 33.0 mB/s | 17.8 mB/s | 14.5 mB/s | 39.8 mB/s |
100K File (eMMC->eMMC) | 37.8 mB/s | 34.5 mB/s | 29.7 mB/s | 48.9 mB/s |
100K File (SD->SD) | 8.47 mB/s | 8.30 mB/s | 8.06 mB/s | 12.7 mB/s |
Database Operation | 10.0 ops/s | 5.73 ops/s | 4.56 ops/s | 19.4 ops/s |
Zip Compression | 0.509 s | 0.848 s | 0.637 s | 0.561 s |
Zip Decompression | 0.097 s | 0.206 s | 0.089 s | 0.073 s |
Most of the Basemark tests are lightly threaded, but looking at the SMP test gives you another example of Tegra 3's strengths given the right workload. With the right application, Tegra 3 can be faster than the MSM8960, however it's still our opinion that you're more likely to find a lightly threaded workload on a smartphone than you are going to encounter something that scales well to four cores.
49 Comments
View All Comments
mutil0r - Saturday, February 25, 2012 - link
While true, outside of rare exceptions (Xperia Play) where the OEM specifically asks the manufacturer for optimized drivers, rarely do OEM's call for anything beyond baseline drivers because of massive catrier testing and validation cycles.We havent reached desktop GPU type maturity and cadence to have drivers bump up performance, yet.
Wishmaster89 - Saturday, February 25, 2012 - link
It would depend on relations between qualcomm and ODM, but I'd suspect that after last year's fiasco with msm8x60 they'll try their best to assure that final devices are as good as they can get, and that would mean upgrading drivers for their chipsets.In worst case scenarios we'll have to put our faith in custom ROM's to always use most recent drivers from newer devices, cause it was proven that both Adreno 205 and 220 got faster with more mature drivers.
mutil0r - Saturday, February 25, 2012 - link
I think it is important to remind ourselves that we are still comparing a development platform (the MDP) to a shipping device (Transformer Prime). To know what I'm trying to say here, please have a look at the previous MDP8660 numbers vs. those of shipping 8x60 devices. I understand manufacturers are trying to close this gap, but I would be wary of simply taking their word for it.Next, I would not give Electopia much weight because it is a Qualcomm developed benchmark. I'm surprised AT even published those numbers.
IMHO the only benchmark in the above list where the 225 has an advantage, on paper, is Basemark. Basd
mutil0r - Saturday, February 25, 2012 - link
Based on what i understand, Basemark tests have unrealistically long shader calls. While it is good to know that the Qualcomm architecture is better equipped to handle this, real world implications are far less impressive, given the current state of mobile graphics in the industry.Simply put, the comparison is not correct and therefore to draw conclusions based on this would also not be right.
As an aside, im interested in knowing what sort of memory typr/clocks theMDP is running. I'm willing to make a calculated guess that this is probably not what we'll be seeinf on shippibg devices because of BoM, packaging and thermal concerns.
Also, I read (i dont remember where exactly though) that the Tegra 3 CPU clocks have been bumped from 1.3/1.4 to 1.4/1.5. Again, i'll believe it only when I see it, but im curious if this supposed new revision also includes a gpu clock bump.
Eneq - Tuesday, March 13, 2012 - link
Regarding Electopia...What you say is not quite true, its developed on contract from Qualcomm but the engine itself is a commercial engine thats been used in multiple titles.
That said it is slightly skewed by not focusing too much on things that are known to be a slight problem for Adreno (FBOs and pixel shaders for instance) however thats not a big concern for modern games.
You can just compare the results from an Adreno run with Imagination which are comparable, however Tegra 2 has always had issues. But Tegra 2 has other issues as well so unlikely due to this specific app (the Tegra 2 devices I have been working on show some problems with either fillrate or bus bandwidth and that doesnt seem to be changing...)
ChronoReverse - Thursday, February 23, 2012 - link
It seems to me that there's some seriously problem with this benchmark.For instance, with Exynos you get 34.6 fps @ 800x480 but somehow you get 42.5 fps @ 1280x720 (offscreen).
This really doesn't make a lick of sense and cannot be explained by vsync either.
dcollins - Thursday, February 23, 2012 - link
"Today we're focused on the SoC comparisons however the first MSM8960 devices will also benefit from having integrated 28nm LTE baseband as well."This to me is the most important factor. Tegra 3 SOCs will be forced to use a discreet baseband chip while the MSM8960 has an integrated baseband. I think this fact alone will be sufficient to give Krait the lead in terms of battery life while allowing for slimmer devices.
I have an upgrade coming in March and I cannot wait to get my hands on a new Krait based phone. I have been itching to own an HTC Android phone for some time now; these new devices cannot come soon enough!
jwcalla - Thursday, February 23, 2012 - link
It's pretty clear -- and exciting -- to see where the future is going with all of this. The consistent improvements being made in these chips are both impressive and rapid.Somehow -- and I'm still scratching my head a bit on this one -- the announcement of Ubuntu for Android didn't make it to the front page of AT. But that concept kind of ties into where these higher-performing chips are really going to shine. It might be an instance where a quad-core could offer benefits over a higher-clocked dual-core.
Kidster3001 - Thursday, February 23, 2012 - link
SunSpider performance will go down on all devices with that switch to ICS. The Crankshaft engine has some startup overhead that cannot be overcome during the extremely short test times of SunSpider. It will however do much better than the old V8 engine in longer running javascript such as V8 benchmark or Kraken. SunSpider has been good for a long time but it runs too quickly on modern hardware/javascript engines to be meaningful any more. I suggest you retire it gracefully and move to either V8 or Kraken for pure javascript performance benchmarking.Lucian Armasu - Friday, February 24, 2012 - link
I think we should stop using the Sunspider benchmark. Google said last year that they aren't focusing so much on it because they don't find it relevant anymore, and they even used a "50x Sunspider" test to have a better idea of where the browsers are today. But either way their point was that the Sunspider benchmark is obsolete, and it doesn't really give a feel for the real browser performance anymore.