Titan’s Compute Performance (aka Ph.D Lust)

Because GK110 is such a unique GPU from NVIDIA when it comes to compute, we’re going to shake things up a bit and take a look at compute performance first before jumping into our look at gaming performance.

On a personal note, one of the great things about working at AnandTech is all the people you get to work with. Anand himself is nothing short of fantastic, but what other review site also has a Brian Klug or a Jarred Walton? We have experts in a number of fields, and as a computer technology site that includes of course includes experts in computer science.

What I’m trying to say is that for the last week I’ve been having to fend off our CS guys, who upon hearing I had a GK110 card wanted one of their own. If you’ve ever wanted proof of just how big a deal GK110 is – and by extension Titan – you really don’t have to look too much farther than that.

Titan, its compute performance, and the possibilities it unlocks is a very big deal for researchers and other professionals that need every last drop of compute performance that they can get, for as cheap as they can get it. This is why on the compute front Titan stands alone; in NVIDIA’s consumer product lineup there’s nothing like it, and even AMD’s Tahiti based cards (7970, etc), while potent, are very different from GK110/Kepler in a number of ways. Titan essentially writes its own ticket here.

In any case, as this is the first GK110 product that we have had access to, we couldn’t help but run it through a battery of tests. The Tesla K20 series may have been out for a couple of months now, but at $3500 for the base K20 card, Titan is the first GK110 card many compute junkies are going to have real access to.

To that end I'd like to introduce our newest writer, Rahul Garg, who will be leading our look at Titan/GK110’s compute performance. Rahul is a Ph.D student specializing in the field of parallel computing and GPGPU technology, making him a prime candidate for taking a critical but nuanced look at what GK110 can do. You will be seeing more of Rahul in the future, but first and foremost he has a 7.1B transistor GPU to analyze. So let’s dive right in.

By: Rahul Garg

For compute performance, we first looked at two common benchmarks: GEMM (measures performance of dense matrix multiplication) and FFT (Fast Fourier Transform). These numerical operations are important in a variety of scientific fields. GEMM is highly parallel and typically compute heavy, and one of the first tests of performance and efficiency on any parallel architecture geared towards HPC workloads. FFT is typically memory bandwidth bound but, depending upon the architecture, can be influenced by inter-core communication bandwidth. Vendors and third-parties typically supply optimized libraries for these operations. For example, Intel supplies MKL for Intel processors (including Xeon Phi) and AMD supplies ACML and OpenCL-based libraries for their CPUs and GPUs respectively.  Thus, these benchmarks measure the performance of the combination of both the hardware and software stack.

For GEMM, we tested the performance of NVIDIA's CUBLAS library supplied with CUDA SDK 5.0, on SGEMM (single-precision/fp32 GEMM) and DGEMM (double precision/fp64 GEMM) on square matrices of size 5k by 5k. For SGEMM on Titan, the data reported here was collected with boost disabled. We also conducted the experiments with boost enabled on Titan, but found that the performance was effectively equal to the non-boost case. We assume that it is because our test ran for a very short period of time and perhaps did not trigger boost. Therefore, for the sake of simpler analysis, we report the data with boost disabled on the Titan. If time permits, we may return to the boost issue in a future article for this benchmark.

Apart from the results collected by us for GTX Titan, GTX 680 and GTX 580, we refer to experiments conducted by Matsumoto, Nakasato and Sedukin reported in a technical report filed at the University of Aizu about GEMM on Radeon 7970.  Their exact parameters and testbed are different than ours, and we include their results for illustrative purposes, as a ballpark estimate only. The results are below.

DGEMM

Titan rules the roost amongst the three listed cards in both SGEMM and DGEMM by a wide margin. We have not included Intel's Xeon Phi in this test, but the TItan's achieved performance is higher than the theoretical peak FLOPS of the current crop of Xeon Phi. Sharp-eyed readers will have observed that the Titan achieves about 1.3 teraflops on DGEMM, while the listed fp64 theoretical peak is also 1.3 TFlops; we were not expecting 100% of peak on the Titan in DGEMM. NVIDIA clarified that the fp64 rating for the Titan is a conservative estimate. At 837MHz, the calculated fp64 peak of Titan is 1.5 TFlops. However, under heavy load in fp64 mode, the card may underclock below the listed 837MHz to remain within the power and thermal specifications. Thus, fp64 ALU peak can vary between 1.3 TFlops and 1.5 TFlops and our DGEMM results are within expectations.

Next, we consider the percentage of fp32 peak achieved by the respective SGEMM implementations. These are plotted below.

Percentage of peak achieved on SGEMM

Titan achieves about 71% of its peak while GTX 680 only achieves about 40% of the peak. It is clear that while both GTX 680 and Titan are said to be Kepler architecture chips, Titan is not just a bigger GTX 680. Architectural tweaks have been made that enable it to reach much higher efficiency than the GTX 680 on at least some compute workloads. GCN based Radeon 7970 obtains about 63% of peak on SGEMM using Matsumoto et al. algorithm, and Fermi based GTX 580 also obtains about 63% of peak using CUBLAS.

For FFT, we tested the performance of 1D complex-to-complex inplace transforms of size 225 using the CUFFT library. Results are given below.

FFT single precision

FFT double precision

Titan outperforms the GTX 680 in FFT by about 50% in single-precision. We suspect this is primarily due to increased memory bandwidth on Titan compared to GTX 680 but we have not verified this hypothesis.  GTX 580 has a slight lead over the GTX 680. Again, if time permits, we may return to the benchmark for a deeper analysis. Titan achieves about 3.4x the performance of GTX 680 but this is not surprising given the poor fp64 execution resources on the GTX 680.

We then looked at an in-house benchmark called SystemCompute, developed by our own Ian Cutress. The benchmark tests the performance on a variety of sample kernels that are representative of some scientific computing applications. Ian described the CPU version of these benchmarks in a previous article. Ian wrote the GPU version of the benchmarks in C++ AMP, which is a relatively new GPGPU API introduced by Microsoft in VS2012.

Microsoft's implementation of AMP compiles down to DirectCompute shaders. These are all single-precision benchmarks and should run on any DX11 capable GPU. The benchmarks include 2D and 3D finite difference solvers, 3d particle movement, n-body benchmark and a simple matrix multiplication algorithm. Boost is enabled on both the Titan and GTX 680 for this benchmark. We give the score reported by the benchmark for both cards, and report the speedup of the Titan over 680. Speedup greater than 1 implies Titan is faster, while less than 1 implies a slowdown.

SystemCompute scores (higher is better)
Benchmark GTX 580 GTX 680 GTX Titan Speedup of Titan
over GTX 680
2D FD 9053 8445 12461 1.47
3D FD 3133 3827 5263 1.37
3DPmo 41722 26955 40397 1.49
MatMul 172 197 229 1.16
nbody 918 1517 2418 1.59

The benchmarks show between 16% and 60% improvement, with the most improvement coming from the relatively FLOP-heavy n-body benchmark. Interestingly, GTX 580 wins over the Titan in 3DPMo and wins over the 680 in 3DPmo and 2D.

Overall, GTX Titan is an impressive accelerator from compute perspective and posts large gains over its predecessors.

The Final Word On Overclocking Titan’s Compute Performance, Cont
Comments Locked

337 Comments

View All Comments

  • ronin22 - Thursday, February 21, 2013 - link

    That's the point, it's not a gamerz card
  • Finally - Thursday, February 21, 2013 - link

    "Titan delivers the kind of awe-inspiring performance we have come to expect from NVIDIA’s most powerful video cards."
    If you hear unfiltered Nvidia marketing speak like this, you know that AT isn't fooling around when it comes to earning their PR dollars. Well done!
  • Scritty - Thursday, February 21, 2013 - link

    Paper launch? Fine. I get that. But I suspect stock levels will be seriously limited. Rumour has it that only 10,000 of these will be made - which seems very odd as even with a substantial profit marging - the ROI on development costs is going to be hard to recoup with a potential sales level as low as that.

    I'm looking to buy a couple of these as soon as they are available for SLI - maybe 3 for a triple set up if possible, but I can see there being real issues with stock. I decent solution 3 screen at 2560x1440 for sure - if you can get hold of them anywhere.
  • Ryan Smith - Thursday, February 21, 2013 - link

    Note that NVIDIA specifically shot down the 10K card rumor. As far as we've been advised and as best as we can tell, card availability will be similar to what we saw with the GTX 690. Which is to say tight at first, but generally available and will continue to be available.
  • Egg - Thursday, February 21, 2013 - link

    The chart on page 1 is missing a 'GB' under GTX Titan's VRAM listing. There aren't any 5760*1200 non-GE 7970 benchmarks. Also, on the Power, Temperature, and Noise page, "temperate" should be "temperature" just before the first chart.

    Additionally, the voltage issue HollyDOL and the strange Crysis Warhead 1080p E Shader/G Quality issue silverblue mentioned should be clarified as well. (I'm just repeating them here so they have a higher chance of being seen.)

    Also, Wolfram|Alpha interprets "gigaflops" as "billion floating point operations per second" by default, while offering an alternative interpretation that doesn't have the seconds unit. Wikipedia also defines flops as already having the time unit. By their standards, flops/s is technically incorrect. I'm not scientist, and I actually didn't notice this until typed gigaflops into Wolfram|Alpha, so take this for what little it's worth.

    It's silly to suggest that this card needs a voltmod and a waterblock. Very few people doing scientific work are going to be having time to do that. This card isn't intended to be a gaming card. Yes, there undoubtedly will be people on hwbot who would love to do such a thing, but relative to the population of scientists living on meager grants, they're small.

    It's also silly to say that Titan is a bad card because it isn't as efficient as other cards at password hashing or bitcoin mining. These embarallel workloads aren't representative of scientific workloads. Besides, the most dedicated people have a custom FPGAs or ASICs for those workloads.

    Saying that it shows Nvidia jacking up prices on its flagship is misleading. Yes, it's technically true. But I saw someone say that the GTX 680 was only a "midrange" card. The GTX 680 still competes with the Radeon 7970 GE. It isn't outright winning anymore - in certain games, it loses - and it's often substantially more expensive. But it's still reasonably competitive. Why did anyone expect Titan to push down GTX 680 prices? If anything, it might push down Tesla 20X prices, but I'm not holding my breath.
    Would anyone have complained about Nvidia being outrageously greedy if Titan didn't exist in the consumer space at all?

    (Moreover, the GTX 580 had FP64 performance at 1/8 FP32 performance, not Titan's 1/3. (http://www.anandtech.com/show/4008/nvidias-geforce...

    Simply looking at the specs partially explains why the card is so damn expensive. It's 7.1 billion transistors, compared to the GTX 690's 2*3.5 billion transistors. (Page 1 on this article). Going purely by transistor count, Titan is underpriced, because it's just as expensive as the GTX 690. Looking at die area instead is less forgiving, but don't forget that squeezing 7 billion transistors on a single die is more difficult than having two 3.5 billion transistor dies. Titan also has 2 extra gigabytes of GDDR5.

    The only valid criticism I've seen is that Titan can be outperformed by two 7970 GEs in certain, mostly FP32 compute workloads, which are a cheaper solution, especially for scientists who probably aren't as concerned with heat production as those working on the Titan supercomputer. After all, you can fit bigger fans in an EATX case than in most racks. 689 Gflops is still greater than 50% of 1309 Gflops; it's 53%. When you can find the cheapest 7970 GEs at a bit over $400, two 7970s will be about $200 cheaper.
    But figure in power: http://www.wolframalpha.com/input/?i=200+W+*+1+yea... . After a year of continuous usage (or two years of 50% utilization), and assuming that two 7970 GEs will use 200 more watts than a Titan - a fairly reasonable estimate in my opinion - Wolfram|Alpha informs us that we'll have saved $216.
    Not to mention the fact that two 7970s occupy around twice as much space as a Titan. That means you need more individual systems if you're looking to scale beyond a single workstation.
    And finally, anyone who needs as much memory per GPU as they can get will need Titan.
    It's hard to draw any real conclusions right now, though, with DirectCompute dubious and OpenCL broken. Great work on Nvidia's part, getting the drivers working...

    There's also the fact that Nvidia is marketing this as a gaming card, which is disingenuous and poor practice. But come on, we all read Anandtech for a reason. Overhyped marketing is nothing new in technology.

    So in conclusion - treat the GTX 680 as the flagship single-GPU consumer card. (They did call it a 680. See the GTX 580, 480, and 280.) It's in roughly in 7970GE's ballpark when it comes to price and performance. For gamers, Titan can effectively be ignored.
    If you need FP32 compute performance, consider multiple 7970 GEs as well as Titan.
    If you need FP64 compute performance, Titan is unparalleled, assuming you run it for a decent amount of time.
    And if you're trying to set a world record, well, I guess you can pay through the nose for Titan too.
  • Insomniator - Thursday, February 21, 2013 - link

    Thank you, so many here just sound like butt hurt kids that do not understand these concepts or maybe didn't even read the article. Few of them would buy it at the $700 they cry about wanting it to be.

    This card is not just for gamers, and even if it were, performance wise it crushes the next closest single GPU competitor. Remember when Intel EE editions were $1k? The best always costs extra... and in this case the card isn't even being marketed soley for gamers anyway.

    Until AMD puts out a new card that can beat it for cheaper, this will remain a $1k. Until then, the 680, 670, and 660 are all competitive products.
  • CeriseCogburn - Tuesday, February 26, 2013 - link

    Don't expect the crybaby fools to respond. They'd prefer to pretend your post isn't here.

    If they do say anything, it will just be another repetitious pile of tinfoil hat lies Charlie D will be proud of.
  • Olaf van der Spek - Thursday, February 21, 2013 - link

    Still only average framerates? :(
    I had hoped you'd move to minimum framerate / max frametime based benchmarking. Averages are (and were) kinda meaningless.
  • Ryan Smith - Thursday, February 21, 2013 - link

    Actually we have some FRAPS data for a few of our games as a trial of some ideas. Unfortunately you won't see it for this article as there simply wasn't enough time to put that together on top of everything else. But keep your eyes peeled.
  • GiantPandaMan - Thursday, February 21, 2013 - link

    The Titan was a compute part, first and foremost. Gamers have much better alternatives in the 7970/680 route.

    Personally I think it's a pretty impressive piece of hardware, though there's no way in hell I'd ever buy it. That's because I'm a value oriented buyer and I don't have that much disposable income.

    I just don't get all the indignation and outrage. It's not like nVidia screwed you over in some way. They had an expensive piece of hardware designed for compute and said to themselves, what the hell, why not release it for gamers?

Log in

Don't have an account? Sign up now