What’s in a Benchmark? This is a pertinent question that all users need to ask themselves, because if you don’t know what a benchmark actually tests and how that relates to the real world, the scores are meaningless. Today, AMD has announced that they are resigning from BAPCo over a long standing dispute over the weighting of scores within the SYSmark suite. AMD specifically references SYSmark 2012 (SM12), but there have been complaints in the past and the latest release is apparently the proverbial straw that broke the camel’s back.

You can read more about the decision on Cheif Marketing Officer (CMO) Nigel Dessau’s blog, but this announcement comes at an interesting time since BAPCo just shipped us copies of the final SM12 release. We haven’t had a chance to run the suite yet, and we’ll still have a look at the results and see how AMD and Intel platforms compare at some point, but it looks like we have a foregone conclusion: Intel will come out ahead. What we really need to examine is why Intel gets a better score.

If you’ve been reading AnandTech for any length of time, you’ll know that we place a lot more weight on real-world benchmarks rather than synthetic tests, but certain tasks can be very difficult to test in a meaningful way. How do you measure every day tasks like surfing the web in a meaningful way when most CPUs are 95% idle performing that task? When we really look at the market right now, in many cases we can conclude that just about any current computer will be fast enough for 90% of users. If you want to surf the Internet, write email, work in Office applications, watch some movies, listen to music, etc. you can do that on anything from a lowly AMD Brazos netbook to a hex-core monster system. Yes, we did leave out Atom, because there are certain areas where it falls short—specifically, certain movie formats prove to be too much for the current Atom platform, particularly if you’re looking at HD H.264 content (e.g. YouTube and Hulu).

Reading through AMD’s announcement and Nigel’s blog, it’s pretty clear what AMD is after: they want the GPU to play a more prominent role in measurements of overall system performance. On the one hand, we could say that AMD is simply trying to get benchmarks to favor their APUs, since Brazos and Llano easily surpass the Intel competition when it comes to graphics and video prowess. This would certainly be true, but then we also have to consider what users are actually doing with their PCs. SYSmark has always included a variety of tests, and certainly knowing how fast your computer is in regards to Excel performance can be useful. However, AMD claims that a disproportionate weight is given to some tests, with mention of optical character recognition and file compression activities in particular.

We don’t have the full SM12 whitepaper yet, but we can look at the list of applications that are tested, and a few things immediately stand out. There are two web browsers in the list, but both versions are now outdated. Internet Explorer 8 has been replaced by Internet Explorer 9, and Firefox 3.6 is replaced by Firefox 4.0—with Firefox 5 just around the corner. Without newer browsers, HTML5 is basically untested by SM12, and while we understand that SM12 has been in development for a while, for something calling itself 2012 to include mostly 2010 applications feels out of place. Considering IE9 and FF4 both shift to GPU-accelerated engines, AMD would certainly have benefited from the use of the latest versions. The remaining applications look reasonable, but again we have no information on weighting of scores, so we’ll have to see how the results pan out.

Ultimately, the main thing to take away from all of this is that, just like the PCMark, 3DMark, Cinebench, SunSpider, etc. benchmarks we routinely refer to, SYSmark 2012 is merely one more tool to analyze system performance. It will be interesting to see how other elements—like the presence or lack of an SSD—impact the score. In our opinion most users would benefit far more from running something like Llano with an SSD as opposed to Sandy Bridge with an HDD, so the CPU/GPU/APU are not the only factors, but it still depends on your intended use. If you’re running a server, obviously the demands placed on the system will be far different from the average home computer. Multimedia professionals that spend a lot of time in Adobe Photoshop and/or Premiere likewise have different needs.

Is AMD right? Is heterogeneous (e.g. CPU and GPU working together) computing more important now than raw CPU performance, or is SYSmark12 merely proving what we already know: Sandy Bridge is really fast? Let us know what you think, but as always remember that when you’re looking at benchmark charts, take a minute to think about what the bars actually represent. The full news release is below, but again you can find substantially more detail in Dessau’s blog.

Update: It turns out AMD is not the only party to have left the BAPCo consortium recently. We've just confirmed with NVIDIA that they have also left the BAPCo consortium. No reason was given.

Update 2: BAPCo has released a statement in return. The consortium notes that AMD approved 80% of the development milestones and that AMD was never threatened with expulsion. The full statement is attached below.

Update 3: We've finally gotten official confirmation (as rumored earlier) that VIA has also left the consortium. They have sent a short statement to SemiAccurate which we have included below. The basis of their complaints are much the same as AMD's: they don't consider SYSMark 2012 to reflect real world usage.


AMD Will Not Endorse SYSmark 2012 Benchmark

— AMD Separates from Association with Industry Group BAPCo —

SUNNYVALE, Calif. — 21, 2011 — AMD (NYSE: AMD) today announced that it will not endorse the SYSmark 2012 Benchmark (SM2012), which is published by BAPCo (Business Applications Performance Corporation). Along with the withdrawal of support, AMD has resigned from the BAPCo organization.

“Technology is evolving at an incredible pace, and customers need clear and reliable measurements to understand the expected performance and value of their systems,” said Nigel Dessau, senior vice president and Chief Marketing Officer at AMD. “AMD does not believe SM2012 achieves this objective. Hence AMD cannot endorse or support SM2012 or remain part of the BAPCo consortium.”

AMD will only endorse benchmarks based on real-world computing models and software applications, and which provide useful and relevant information. AMD believes benchmarks should be constructed to provide unbiased results and be transparent to customers making decisions based on those results. Currently, AMD is evaluating other benchmarking alternatives, including encouraging the creation of an industry consortium to establish an open benchmark to measure overall system performance.

AMD encourages anyone wanting more details about the construction and scoring methodology of the SM2012 benchmark to contact BAPCo. For more details on AMD’s decision to exit BAPCo, please read AMD’s Executive Blog authored by Nigel Dessau.


BAPCo® Reaffirms Open Development Process For SYSmark® 2012

SAN MATEO, Calif.—(BUSINESS WIRE)—Business Applications Performance Corporation (BAPCo®) is a non-profit consortium made up of many of the leaders in the high tech field, including Dell, Hewlett-Packard, Hitachi, Intel, Lenovo, Microsoft, Samsung, Seagate, Sony, Toshiba and ARCintuition. For nearly 20 years BAPCo has provided real world application based benchmarks which are used by organizations worldwide. SYSmark® 2012 is the latest release of the premiere application based performance benchmark. Applications used in SYSmark 2012 were selected based on market research and include Microsoft Office, Adobe Creative Suite, Adobe Acrobat, WinZip, Autodesk AutoCAD and 3ds Max, and others.

Advanced Micro Devices (AMD) was, until recently, a long standing member of BAPCo. We welcomed AMD’s full participation in the two year development cycle of SYSmark 2012, AMD’s leadership role in creating the development process that BAPCo uses today and in providing expert resources for developing the workload contents. Each member in BAPCo gets one vote on any proposals made by member companies. AMD voted in support of over 80% of the SYSmark 2012 development milestones, and were supported by BAPCo in 100% of the SYSmark 2012 proposals they put forward to the consortium.

BAPCo also notes for the record that, contrary to the false assertion by AMD, BAPCo never threatened AMD with expulsion from the consortium, despite previous violations of its obligations to BAPCo under the consortium member agreement.

BAPCo is disappointed that a former member of the consortium has chosen once more to violate the confidentiality agreement they signed, in an attempt to dissuade customers from using SYSmark to assess the performance of their systems. BAPCo believes the performance measured in each of the six scenarios in SYSmark 2012, which is based on the research of its membership, fairly reflects the performance that users will see when fully utilizing the included applications.


VIA's Statement About Leaving The BAPCo Consortium

VIA today confirmed reports that we have tendered our resignation to BAPCo. We strongly believe that the benchmarking applications tests developed for SYSmark 2012 and EEcoMark 2.0 do not accurately reflect real world PC usage scenarios and workloads and therefore feel we can no longer remain as a member of the organization.

We hope that the industry can adopt a much more open and transparent process for developing fair and objective benchmarks that accurately measure real world PC performance and are committed to working with companies that share our vision.

Comments Locked

116 Comments

View All Comments

  • Spoelie - Tuesday, June 21, 2011 - link

    There's no "checking the story" necessary, this is an official press release from AMD. Both NVIDIA and VIA have not issued press releases as of yet, any other information is conjecture.
  • JarredWalton - Tuesday, June 21, 2011 - link

    What exactly am I supposed to do? AMD sent me a news release saying they are leaving BAPCo and I wrote an article about that. The fact that NVIDIA is apparently leaving as well (with absolutely no reason given) is nice to know, but that's not necessary for this piece of news. Checking the story? Um... which part of "AMD PR sent it to me" do you not get? AMD sent it, and it's about AMD, so I'm pretty sure I have the story I need. It took NVIDIA several hours to get back to us with, "Yes, we are leaving." VIA, we still haven't heard from. I'd better pull this story for six hours while we wait for more details, because other people leaving changes... nothing.
  • Brunk - Wednesday, June 22, 2011 - link

    it changes a lot. as was already stated before now it seems AMD is leaving due to issues with their own product instead of the benchmark itself.

    When you know EVERYONE is leaving it tells a different story altogether
  • Donnie Darko - Wednesday, June 22, 2011 - link

    To start as a review site (not micro-blogging or news [such as DailyTech]) I would start by never publishing a press release verbatim or write and 'article' with one as the soul source of info. Otherwise you end up with this: http://semiaccurate.com/2011/06/16/intel-declares-...

    which turns into this quietly: http://www.tomshardware.com/reviews/intel-motherbo...

    by which time most sentient life has already stopped reading your site (which wouldn't matter) and you loose page views and so add dollars (which does matter).

    Now that I've answered your question, I'll take the time to refute your assertion. You did not write an article about 'AMD leaving BAPCo'. This information showed up in your article, but what you wrote was an article about benchmarking software; your opening statement "What’s in a Benchmark? This is a pertinent question that all users need to ask themselves, because if you don’t know what a benchmark actually tests and how that relates to the real world, the scores are meaningless."

    You go on to discuss the limitations of benchmarking software and do some editorializing on different chips and platforms (Atom vs Brazos, Llano vs SB and Discrete etc) and then attempt to justify your continued use of the inappropriate tools. All of this is fine.

    Scattered throughout though, you leave journalistic integrity (and facts) behind and begin to make assumptions about an event that you've obviously not investigated or understand. "Reading through AMD’s announcement and Nigel’s blog, it’s pretty clear what AMD is after: they want the GPU to play a more prominent role in measurements of overall system performance."

    That quote is what is called a lie though omission. You picked one of four key things to focus on (heterogeneous computing) and failed to even mention the others. The real list is: failure to have an open benchmark (to review what's being tested), failure to use representative work loads (heterogeneous computing), bias to Intel designs, and generation of misleading results.

    That you picked the least important of the listed reasons is telling (neither option is kind so I won't call them out explicitly), as while AMD would love to have a major benchmarking software focus on what they do well to the exclusion of their competition, this hurts them much less than the other three.
    Failure to be open: If the result says CPU X is 20% faster than CPU Y but doesn't tell you what it's benching then it's meaningless. If CPU X is 100% faster than CPU Y at task N and task N gets 65% of the weighting but task N has no real world relevance to customer P then CPU X isn't better than CPU Y for customer P. This is very important.
    Bias to Intel designs: this shouldn't need additional explanation.
    Generation of Misleading results. This ties into the above two, but has to do with the over all packaging, of the benchmarks than anything else, so gets its own category.

    At the beginning of the article you also mention that Intel will be faster than AMD in SYSmark and then spend a good chunk of the article defending the future use of SYSmark. Despite all the other text this is editorial bias and doesn't happen with good/careful authours. It leads to people thinking that Intel is a better CPU regardless of surrounding details and suggests that AMD is complaining only because they are worse.

    Here's where we get to the 'it's important to do your research before publishing a story'. The fact that AMD and NVIDIA and VIA dropped out of BAPco is of critical importance. NVIDIA is a graphics manufacture in this space (no CPU intrest) and VIA has no graphics interest. This stops being a case of a poorly performing product being massaged by PR to a legitimate concern about SYSmark. 2/3 parties with interest in x86 compute left over concerns about the validity of the product, and 2/3 parties with interest in x86 gfx left over concerns about the validity of the product. It needs to be a cold day in hell for Nvidia to line up and say AMD is right.
    Others have been very critical of BAPco too such as the guys at opensourcemark who have documented examples of SYSmark heavily biasing results to fit Intel's designs.

    For the why any of this matters beyond personal integrity (which I freely admit doesn't mean anything on the internet, you pay for the server so you get to say what you want) we have to look past Intel's behaviour (which i think is fine, they are in it to sell chips, so by all means make 'tools' that make your product look good) and at what Anandtech does.

    You benchmark and review hardware. If a key tool that allows you to carry out your jobs and run your business comes under question you normally do everything in your power to check the veracity of the tool. If someone in the pharmaceutical industry tells you the cholesterol test your doctor just gave you is really only designed to sell more medication and the results are really biased, you'd expect your doctor to have checked into this for you. While you are thankfully not responsible for anyone's life, people still trust you with their purchasing decisions based on the work done at Anandtech. Given how much time the site spends parroting about how unbiased and fair you are, how you try to use tests that give meaningful results and aren't swayed by PR, these issues are serious for you.

    So what are you suppose to do? Do your job. That means a little research, maybe run some tests of your own. SySmark says Intel CPU X out performs AMD CPU Y at Excel by N% you can test that. It turns out that you do this for a living, and have access to the gear to try CPU X vs CPU Y. Run a montecarlo simulation, sort a very large data set, run some macros. These are easy things to do while confirming facts about a story. That way when you sit down and write an article your readers get the story, the facts and reliable conclusions.

    If you want to write for Endgaget (what you presented in this article) go write for Endgaget. If you want to be a PR rep for a company then go work in PR. If you want to write hardware reviews, then you need to actually stick to the tenants of your job all the time. You don't publish press release performance predictions, so why publish press release benchmark predictions?

    Daniel

    PS: VIA has also publicly confirmed it has left BAPco.
  • JarredWalton - Thursday, June 23, 2011 - link

    Amazingly enough, we are a technical site that often runs news stories, and the fact that AMD left BAPCo is pretty big news. Yes, even AnandTech has tech stories similar to what you might find on Engadget or DailyTech. You might look here, for instance:
    http://www.anandtech.com/news/

    I don't generally try to read what everyone else writes about a particular piece of news and then echo the thoughts of the market; I think for myself and provide my own technical analysis. So when AMD says that they don't like the latest SYSmark, immediately the first thing that comes to mind is, "Gee, I bet it doesn't favor their APUs as much as they would like." Whether that's good or bad is a different story, but to pretend that AMD isn't politicking is ludicrous.

    My assertion is that you need to know what every benchmark does in order to determine whether the results are meaningful or not. I don't care if it's SYSmark, PCMark, Cinebench, 3DMark, SunSpider, or whatever. That AMD is leaving because they disagree with how SYSmark 2012 works is fine by me. I disagree with lots of benchmarks as being meaningful (Sandra and SuperPi immediately come to mind). Running benchmarks for a news story, especially when I may not have appropriate hardware on hand, doesn't work.

    So now VIA and NVIDIA confirm they have left, but no one is really saying why other than VIA apparently saying they don't feel the workload SM12 measures represents a modern user or whatever. I still won't run SM12 on laptops, just like I didn't run SM07, because it's a royal pain to do so. You need to do a clean install (no service packs or other patches), and even then it doesn't always work. Anand can do it on desktops because he doesn't have to change them up every single review. Even so, including SM12 doesn't make an article any worse, unless the article were to then conclude that because SM12 favors CPU x, you should buy CPU x.

    If the results from SM12 correlate with what we see in other CPU-centric tests, that's fine by me. As long as you understand it's a CPU/general performance metric and makes no demands of the GPU, you know what you're testing and what the results mean. Testing Cinebench and then complaining that it doesn't measure SSD performance would make as much sense to me. When I think of system performance (which is what SYSmark purports to measure), mostly I'm looking at CPU, storage, and perhaps a bit of GPU.

    As I mention elsewhere, most of the people I know (i.e. not computer enthusiasts or gamers) still have no need of a good GPU. GPGPU isn't even remotely mainstream, video works fine on SNB (good enough for everyone besides HTPC purists), and the only thing Llano really does substantially better than Intel is running games on the integrated graphics. Frankly, Llano really isn't a good GPU; it's just a good integrated GPU that's only as fast as a $35 discrete GPU. If the drivers get worked out, Llano could make for a good HTPC setup as well, but right now that's not happening either. So, Llano is good for laptops, but on desktops it just doesn't mean a lot unless you absolutely refuse to buy a dGPU.
  • whatthehey - Friday, June 24, 2011 - link

    Hey Donnie, can you comment on this?
    http://www.brightsideofnews.com/news/2011/6/24/amd...

    Sounds like the lack of GPU aspects in SYSmark 2012 isn't actually the problem if that article is true (and let's be honest: it has plenty of parts that ring true). The problem with SM12 is that Bulldozer is going to suck, and SYSmark just points this out.

    So pull your head out of AMD's ass. This announcement comes from AMD's MARKETING DEPARTMENT. Do you need anything more to prove that this has nothing to do with engineering and architectural superiority? Marketing is trying to get people LIKE YOU to ignore any benchmark that shows how badly AMD's CPUs are falling behind. And it's working. Enjoy your new Bulldozer system, to replace your amazing Phenom II system. Me, I'm going to continue running my Core i7 for a while longer, and probably upgrade to Ivy Bridge or its successor, because Bloomfield is already going to beat Bulldozer, never mind Sandy Bridge and Ivy Bridge.
  • saywut - Sunday, June 26, 2011 - link

    That article is pure garbage, the tone of the writing and the "anonymous source" should make it pretty obvious that you're looking at is somewhere between propaganda-grade and tabloid-grade. Besides, compare Intel's Sysmark superiority to any other real world benchmark, the Sysmark score is always disproportionate to real life, and always in Intel's favor. To say AMD doesn't have a legitimate reason for leaving is absurd, and then you blame the victim by accusing AMD of trying to do exactly what Intel IS doing.

    Donnie hit the nail on the head, as evidenced by Jarred's back-pedalling reply being longer than the actual article. Besides, the lesson we've learned from this is that AMD's "old" K10.5.2 architecture in Llano isn't that far behind SB, unless you build machines to run Sysmark. If Bulldozer makes any improvement at all, then it'll be just that much more competitive.
  • alpha754293 - Tuesday, June 21, 2011 - link

    I think that ALL benchmarking is subject to the eye of the beholder.

    For example, in my current work right now, we run finite element models using Nastran and LS-DYNA. I've also used to run computational fluid dynamics codes such as Fluent and CFX.

    But most CPU benchmarks are rarely as intensive as those programs, and even LINPACK -- depending on how you write/compile/run it - it will have an impact on the results.

    It is for this very reason why those software vendors have developed "standard" benchmark cases that tests a variety of things including hardware and software improvements/features.

    The down side is that a lot of those programs are a) expensive and b) the benchmarks themselves are time consuming. (~25000 seconds for a 3-car crash model, or about 7 hours for one run). So, if you're testing a bunch of new processors, and you're testing the scalability of the additional cores (for example), you can easily spend between 2 weeks to a month just with that one program, running that ONE test.

    Another example is the in the realm of graphics processing. While most people test with computer games, again, because of what I do; we use Altair Hypermesh/Hyperview. When you're looking at a model with 2.2 million nodes; that's a very heavy load for a GPU to handle. And I can almost assure you that even the best, the top of the line current generation consumer cards won't be able to take that kind of loading in stride while getting insane framerates. And while the point about those cards not being designed for such a workload is a valid one, why should you pick a benchmark that caters to what the card does well?

    That's like you're going to measure 0-60 performance against a drag racer, and then complaining that it isn't design to turn, so you're not going to put it on the Nuerburgring.
  • Targon - Tuesday, June 21, 2011 - link

    If you look back at the initial release of Windows Vista(as much as some people hated and still hate it), it did bring 3D to the desktop with the Aero theme. This is where having a better GPU really stands out, and still does. Move a window around, and visually, a better GPU with Aero does improve the experience.

    Yes, it's minor, but the fact that we have GPU power improving the desktop DOES mean that GPU power should not be discounted. There are other areas where GPU power comes into play, and ignoring the overall feel of how good it feels to use the system SHOULD be a part of what benchmarking is about. As the article stated, Firefox 4(and 5 beta) plus IE 9 and other applications also use GPU power to improve performance. You want to do anything involving graphics, and the GPU can really come into play, so why not make it as important as how quickly a spreadsheet calculation can be run?
  • Alexvrb - Tuesday, June 21, 2011 - link

    Yeah, it doesn't exactly instill you with confidence when no major GPU vendor will endorse the software.

Log in

Don't have an account? Sign up now