Yesterday we presented the first results of Qualcomm's Krait based MSM8960 SoC. While we still await the first Krait based phones (widely expected to begin shipping sometime in Q2), courtesy of Qualcomm's MSM8960 Mobile Development Platform we were able to get a good idea of the upper bound for Krait and MSM8960 performance. I mention it's the upper bound because, at least in the past, MDP performance hasn't corresponded directly to shipping device performance. There was a pretty big delta between MSM8660 MDP performance and phones that used the MSM8660. Qualcomm tells us that this time around things are going to be different. Qualcomm is expecting a much narrower (nonexistent?) gap between the MSM8960 development platform and phones that use MSM8960 silicon. One major difference between the MSM8960 MDP and our earlier MSM8660 MDP was the state of the CPU governor. In the earlier MDP the governer was set to max performance, always delivering the CPU's maximum clock frequency. With the MSM8960 platform the governor was set to ondemand, allowing for variable CPU speeds depending on what the OS requests of the device. The ondemand setting is in-line with what we can expect device manufacturers to use when they ship phones. All of this goes to say that while we have a good handle of what Krait and the MSM8960 are capable of, there are still a lot of unknowns.

While it's true that shipping performance remains to be seen, some of the deltas we saw between MSM8960 and the current competition were so great that even a much slower implementation in a shipping phone would still be significantly faster than anything else out today.

We left our MSM8960 investigation with two major unknowns. The first was power consumption. We still haven't been able to get Qualcomm's Trepn tool running on the MSM8660 MDP, which has always been a bit finicky. To get a true feel for MSM8960 battery life we will have to wait for shipping devices. The other major unknown was really how MSM8960 stacks up against NVIDIA's Tegra 3.

Tegra 3 was everything Tegra 2 should have been. We got higher clocks, NEON support and a much faster GPU. The only thing missing from Tegra 3 was a dual channel memory interface. We were happy with Tegra 3 on ASUS' Eee Pad Transformer Prime, but in less than a week we'll get to meet some of the first smartphones based on T3 silicon.

Armed with the Eee Pad Transformer Prime (updated to Ice Cream Sandwich) we're able to get a rough idea of how these two heavyweights will compare. The same caveats that applied to the MDP apply to our Tegra 3 platform as well. Since we are using a tablet we're obviously dealing with a higher TDP than what you'll find in a phone. The comparison today is largely academic and naturally shipping devices may be better or worse that these two representatives. With the disclaimers out of the way, let's get to the comparison.

CPU Performance: Preferring Single vs. Multithreaded Performance

The MSM8960 features two Krait cores compared to the four ARM Cortex A9 cores in NVIDIA's Tegra 3. While the A9 is a very power efficient core, Krait offers a much wider front end, wider execution back end, faster FPU and an improved cache/memory interface. All of these factors together combined with similar clock speeds to what Tegra 3 is able to hit should result in better absolute performance in single or lightly threaded applications. As video decode and transcode are both fully offloaded in all modern SoCs, finding workloads that scale well across more than two cores is difficult. We noted this in our Eee Pad Transformer Prime review - it's just not easy coming up with current apps that scale well to four ARM cores. That's not to say that there are no advantages to more than two cores, but you're more likely to get a benefit from two faster cores vs. four slower ones.

 

 

NVIDIA's saving grace is the fact that it did ramp up A9 clock speed very high in Tegra 3, and it has that handy companion core 4-PLUS-1 architecture to keep power consumption low throughout very light workloads. There's also the fact that while very few smartphone apps will peg four cores constantly, there are periods of time when you'll see more than two cores in use. Multitasking, although more likely to happen in significant amounts on a tablet, can also increase usage of the third and fourth cores on Tegra 3.

We'll start with Linpack, our heaviest floating point/cache/memory bandwidth test:

Linpack - Single-threaded

Single threaded floating point performance is obviously a strength of the MSM8960 and Krait. Qualcomm tells us that Krait is able to multi-issue floating point instructions, something that the Cortex A9 cannot do. The MSM8960 memory controller also appears to be more efficient than previous designs, contributing to the magnitude of the win here.

Move to more threads and the situation doesn't change dramatically, although Tegra 3 is obviously far more competitive thanks to its sheer core count:

Linpack - Multi-threaded

Javascript performance can be multithreaded at times but most of the benchmarks we run don't scale incredibly well beyond two cores. Making matters worse is the fact that SunSpider performance regressed on the Eee Pad Transformer with the latest update to ICS. I've included the old Honeycomb results as a reference for where things should be. Keep in mind that the Honeycomb browser on the Eee Pad Transformer was very heavily optimized for Tegra 3. It's possible that the same degree of optimizations just aren't present in the ICS version yet.

SunSpider Javascript Benchmark 0.9.1 - Stock Browser

Browsermark tells a different story. Here the Tegra 3 based Transformer Prime is actually able to be slightly faster than the MSM8960. The margin of victory is small enough to be a wash, but the fact that NVIDIA is able to remain competitive is important.

BrowserMark

Basemark OS echoes more of what we'd expect. In the overall score the MSM8960 is around 50% faster than the Tegra 3 based tablet. Even if the MSM8960 MDP is unrealistically fast for a Krait platform, it's likely that we'll still see a Krait advantage.

Basemark OS - System
  HTC Rezound Galaxy Nexus ASUS Transformer Prime MDP MSM8960
System Overall Score 658 538 602 907
Simple Java 1 298 loops/s 210 loops/s 240 loops/s 375 loops/s
Simple Java 2 7.28 loops/s 8.61 loops/s 7.27 loops/s 10.8 loops/s
SMP Test 35.3 loops/s 49.2 loops/s 81.2 loops/s 64.4 loops/s
100K File (eMMC->SD) 6.49 mB/s 9.52 mB/s 11.0 mB/s 8.64 mB/s
100K File (SD->eMMC) 33.0 mB/s 17.8 mB/s 14.5 mB/s 39.8 mB/s
100K File (eMMC->eMMC) 37.8 mB/s 34.5 mB/s 29.7 mB/s 48.9 mB/s
100K File (SD->SD) 8.47 mB/s 8.30 mB/s 8.06 mB/s 12.7 mB/s
Database Operation 10.0 ops/s 5.73 ops/s 4.56 ops/s 19.4 ops/s
Zip Compression 0.509 s 0.848 s 0.637 s 0.561 s
Zip Decompression 0.097 s 0.206 s 0.089 s 0.073 s

Most of the Basemark tests are lightly threaded, but looking at the SMP test gives you another example of Tegra 3's strengths given the right workload. With the right application, Tegra 3 can be faster than the MSM8960, however it's still our opinion that you're more likely to find a lightly threaded workload on a smartphone than you are going to encounter something that scales well to four cores.

GPU Performance
Comments Locked

49 Comments

View All Comments

  • lancedal - Thursday, February 23, 2012 - link

    Hi Anand,
    What is the CPU voltage for the 1.5GHz?
  • boostern - Thursday, February 23, 2012 - link

    It's been almost 10 years that I'm following you.
    It's always a joy to read one of your articles.
    Thank you Anand, really.
  • Black1969ta - Thursday, February 23, 2012 - link

    Tegra 2 was out before other dual cores and fell short of those later designs, it is not surprising that Tegra 3 is in the same position performance wise.

    Any one have a link to more news on Kal-El+, other than just name?
    Is Kal-El+ a tock, to Kal-El, from Intel's Playbook?

    If it is then I could see not only a process shrink down to 28nm or 32nm, but "tweaks" also.

    Perhaps with the smaller process they could add a 2nd Memory channel.

    The HTC One X is rumored to be a 1.5 GHz Tegra 3, instead of the 1.3Ghz in the Prime.
  • Lucian Armasu - Saturday, February 25, 2012 - link

    If they actually really the Tegra 3+ this year (they were supposed to release the Tegra 2 3D too last year but didn't because they were late), it will probably be a quad core with at least the first core at 1.8 Ghz or even 2 Ghz, and the others a little lower.

    It should be at least at 2 Ghz if they want to compete with Krait. The problem is, although they were already very late with Tegra 3 to market, they also only release the tablet version first, and the phone version months later. So Krait phones will be available in Q2 this year, Tegra 3+ tablets probably in Q3, and Tegra 3+ phones late Q3 or early Q4.

    If Nvidia actually managed to deliver their chips when they promised they would deliver them, I think they would be in a much better position today, because for example it would be understandable if Krait is more powerful than Tegra 3 in Q2 2012, if phones with Tegra 3 started appearing 2 quarters earlier, like they promised. But that didn't happen, so now once again Tegra 3 is late to market, just like Tegra 2 was, and the competition is already better by the time it starts to get a foothold in the market.
  • fteoath64 - Friday, February 24, 2012 - link

    Nvidia better get to A15 very quick!. They are getting creamed by the strong competition. Here is my suggestion. Stop with the quadcore nonsense. Do a 1+1 (big.little at 2Ghz, dual ch ram), 2+1 and 3+1 for good measure. If they can make the 3+1 to turbo at 2.2Ghz with single core, it would be great.
    Also to find a way to retask the small core to be I/O processor when it is inactive.
  • curtisas - Saturday, February 25, 2012 - link

    Can Qualcomm just make that device have a little less of a lip on the bottom and sell me it? It's running stock Android which is awesome, and the reference devices always have the top of the line hardware!
  • gamoniac - Saturday, February 25, 2012 - link

    Javascript performance can be multithreaded at times...


    I am reposting my comment on an earlier article by Brian --

    The browsers are multi-threaded but javascript does not support multi-threading until the advent of web worker in HTML 5. Although the browser could load images/files with multi threads, javascript snippets on a pre-HTML 5 page only runs in one thread. Was SunSpider's benchmark written in HTML 5?
  • thebeastie - Monday, February 27, 2012 - link

    Now I got a Sony HMZ-T1 all I really care about is how well it can handle a Bluray 13GB MKV rip of Avatar 3D that I may have on local file server or off the device it self so I can go into my bedroom and watch a 3D movie at highest quality possible with no mess or fuss of a PC.

    ATM iPad2 can't handle my really high end rips and I don't want to buy a laptop etc for my bedroom, I want something simple like a tablet.
  • superg05 - Saturday, March 10, 2012 - link

    these so called standard benchmarks only show so much let me see a benchmark that shows how many cores are running for tegra 3 for example for browsing only suppose to use the companion lower core how many cores and gpu cores during the test is each different system running?

Log in

Don't have an account? Sign up now