Comments Locked

50 Comments

Back to Article

  • jjj - Tuesday, January 23, 2018 - link

    Don't forget that GPU also means server, cars and whatnot and that requires additional investments as GPUs and machine learning diverge.

    Navi should tape out this year so doubt much can change - we can't blame the new guys for it, good or bad.
  • eva02langley - Tuesday, January 23, 2018 - link

    Definitely good news, it gives a good impression that AMD definitely takes things more seriously. Let's just hope this prove right and it gives AMD the opportunity to expand into broader horizon.
  • rahvin - Tuesday, January 23, 2018 - link

    Not just diverge, completely separate. I'd be willing to bet that within 5 years machine learning and AI is using custom CPU's and ASICs. No one will be doing it on graphics chips or general CPU's. There's a dozen firms working on AI chips, heck Intel has their own version, in the end it's going to have it's own accelerator just like GPU's.
  • jjj - Tuesday, January 23, 2018 - link

    Actually it's well beyond that in the long run as Moore's Law and Von Neumann compute are reaching their limits. The end goal - nothing is really an end goal, everything just buys time- is a non-volatile switch for brain like devices where compute and memory are intermixed
  • Kevin G - Wednesday, January 24, 2018 - link

    We could explore things like the classic Harvard architecture or twists on that. The big factors that contributed to the creation Von Neumann don't have the same pressure as they do today. Instructions could not only be separated into their own memory domain and bus but also use their own distinct memory type like SRAM to lower latency while bulk data gets the benefit of greater DRAM density or even 3D xpoint.

    I've been curious what the effects would be by segmenting caches by data type. So a chip would have its traditional L1 instruction cache but it would have a data cache for integer compute, scalar FP and then SIMD FP. Bandwidth and latency can be tuned for a particular data type.
  • mode_13h - Wednesday, January 24, 2018 - link

    Cool, so you burn a lot of transistors on cache that's only used some of the time? I think that's probably why nobody does it.

    If you really wanted to tweak cache allocation between different data types, then the way to do it would be by tweaking the eviction policy. Then, you could still share the underlying data cache & potentially the whole thing could be used by any data type when no others are in use. It could also get around the implied requirement of each cacheline having to contain homogeneous data types.

    All that being said, I'm still damn skeptical.
  • Kevin G - Friday, January 26, 2018 - link

    The reason why no one attempts a true Harvard architecture today is due to legacy software: nothing would work that is currently out there of any significance. Compilers would have to be rewritten and even then, commodity open source may need further modification to even work. Never underestimate the power of legacy code base as a resistance to change.

    Caching based upon data type was attempted only in one chip which never saw commercial release: the Alpha EV8. Its 2048 bit wide vector unit would pull data directly from the L2 cache due to the size of the data being moved. The L2 cache itself was tuned to the size of SIMD unit but was not exclusively for SIMD usage. There was still L1 data caches to feed standard integer and floats.
  • tuxRoller - Wednesday, January 24, 2018 - link

    There was a paper from a few months ago that showed how a memristor-type memory could also be used for compute. Most importantly, they also indicated that such an architecture would massively increase both performance and efficiency.
    Now, all we need is the memristor....
  • mode_13h - Tuesday, January 23, 2018 - link

    Except it's not like AI is a solved problem. To address the widest range of applications, you still need flexibility. And GPUs are great for that.

    I'm not saying there won't be purpose-built AI chips, but I think AMD and Nvidia are best-positioned to deliver on that. For instance, imagine V100 without the fp64 or graphics units. Perhaps you could simplify in a few other areas, as well, and maybe replace compressed texture support with support for compressed weights.

    Given the maturity of their toolchain, their installed base, and the solid foundation of an efficient, scalable, programmable architecture, GPUs are actually pretty hard to beat.
  • Kevin G - Wednesday, January 24, 2018 - link

    The biggest thing going for GPUs in this context is that they rarely get true binary code explicitly written for them. Shaders get passed through a just in time compiler in the majority of cases. this has allowed GPU designs to change their designs seemingly on a whim. For example, AMD had several generations of VLIW5 chips, then two VLIW4 designs in the middle of a generation before switching over to GCN 1.0.

    Adding dense processing units like nVidia's Tensor Cores for matrix multiplication can easily be done with the GPU philosophy as the pressure to support legacy code isn't there like it is in the CPU world.
  • mode_13h - Wednesday, January 24, 2018 - link

    Not really. Basically, nobody serious is using GPUs for AI without architecture-optimized libraries in the stack. Most major frameworks have support for multiple different backends, or there are at least forks available with vendor-specific optimizations.

    And that goes especially for the tensor cores, which must be programmed via special intrinsics (or inline asm). Even packed 8-bit integer arithmetic is probably coded explicitly, in inferencing paths, since it's a method one uses very much intentionally.

    But leaving aside special features, simply sizing work units to optimally exploit on-chip memory and tuning kernels to use the right amount of unrolling and pipelining is probably more than enough to justify the use of vendor-optimized libraries.

    That said, it's nice that, to write a custom layer-type, you needn't be concerned with the particulars of the specific hardware generation just to have it run reasonably fast and be reasonably portable.
  • Kevin G - Wednesday, January 24, 2018 - link

    Intel has at least two different AI chips. Knights Mill is out now and then there is the Nervana technology which will be appearing on package with some Xeons later this year.

    This excludes FPGA which also has a niche in this market.
  • mode_13h - Wednesday, January 24, 2018 - link

    Knights Mill was too little, too late. It was already outmatched by P100, and now left in the dust by V100.

    It's just big egos at Intel trying to succeed by doubling down on a bad idea. Intel is big enough to have succeeded in brute forcing x86 into the server & cloud markets, but embedded & AI are too sensitive to its weaknesses.
  • BenSkywalker - Wednesday, January 24, 2018 - link

    On so many different levels this is highly improbable.

    This is an area a lot of companies are going to scramble like mad to catch up in, only to face significantly greater challenges then were faced going up against that third rate cheap computing solution- x86.

    First major obstacle- CUDA is already x86 in AI. People can throw fits, turn blue in the face, be upset about it however they so choose, but it is already a done deal. Despite the idealism behind open standards being appreciable, having a company that has a vested and exclusive reason to invest enormous resources improving one platform simply has enormous real world benefits. Now we are seeing companies waking up to the market that, quite frankly, nVidia built from the ground up.

    Second segment of the first problem- development in this field is taught on nV hardware using CUDA. Idealism and practicality don't match up all that often, and this is another instance of that happening. Obviously they aren't the only option, but they are far more dominant now then x86 was in say 1988. What would it have taken to stop that train?

    Second major problem- this type of AI is simply fundamentally different in how it is being built. The development end is simply figuring out how to give the processor enough information to learn on its own. This is not to be underestimated.

    If Volta started out with a five year advantage over the other players(which may be a bit conservative) that rift grows with every passing day as Volta parts are 'learning' how to run faster on Volta chips running code built for Volta. With Intel devoting billions into the R&D side it may be optimistic thinking they will get a Volta class part out within the next three years, at which point nVidia will have tens of millions of hours worth of machine learning optimized for their significantly faster then Volta parts.

    Don't underestimate how huge of a factor this is going to be. Remember when multi core CPUs first started coming out and great lengths were gone through documenting how fundamentally different it was to try and get your code base to multi thread properly- nVidia has pushed out hundreds of thousands of GPUs that are crunching code to figure out how to span that code base to run on thousands of cores.

    'Bombe'- Turing's Enigma cracking machine could handle one trillion operations in five years. This machine was built because it was significantly faster then the smartest people in the world working in tandem. One trillion operations in five years. We are dealing with one hundred trillion operations per second *per GPU* now. When you consider the data analysis structure for how to thread thousands of concurrent threads concurrently and make it work- it would be a problem not unlike Enigma on a fundamental level. A machine is going to beat a human at this, this truly is thematically much like the first Turing machine.

    Now if we were going to see this market wouldn't be mature for another twenty years, then maybe it would be a more level playing field. Reality is we are going to see a logarithmic increase in throughput of operations as GPUs get better at 'programming' themselves and in five years this market will be far larger then data centers are today.

    This is fundamentally different then what we had even in science fiction as a notion as to what AI was going to entail. Creation wasn't ever something that was contemplated for a machine, but we are already at that point, in a rudimentary fashion at least.

    Market wise, this becomes a much bigger issue as even if Intel shows off some amazing new technology that can directly go toe to toe with nV's biggest and baddest three years from now- the workload actual performance rift will be staggering- thousands of EFLOPS worth of combined computing power for years is simply going to be a staggering obstacle to overcome.

    Now, I'm not saying it is impossible- but we would have to assume Intel would execute perfectly- use their superior fabrication capabilities, hire all of the best AI low level guys they can and get that code base up and running *NOW* to get some level of optimization hammered out years before their product actually hits(the logarithmic element here is their only possible saving grace).

    Unfortunately, AMD is pretty much DoA here. Intel is throwing more money then AMD's net worth at trying to compete with nVidia in this segment already- and they aren't making much progress so far(yes, they have all this R&D coming- so does nV). They may be able to do OK if they can come up with AI chips for handling your toaster, or maybe the thermostat in your house, but they simply don't have the money to even attempt to play in this field. The best they can hope for is that nVidia gets distracted enough for them to stumble in the consumer GPU space(possible- their top talent is going to be AI/Datacenter focused now, we know this because they aren't idiots).

    Google already came up with a dedicated ASIC- Volta beats it at the singular task it can do while being a complete compute solution. Yes, for particular tasks, specifically simple ones, ASICs are going to be completely viable and very efficient to boot. For the big dollar, big iron tasks? nVidia is an absurdly prohibitive favorite right now.(Check the markets, there is a legit discussion on if Apple or nVidia will be the first trillion dollar company).
  • mode_13h - Wednesday, January 24, 2018 - link

    Damn, dude. You sure like pontificating, for someone who's obviously not a practitioner. Several errors in your statements/thinking, but you'd probably just waste my time arguing with me if I did you the favor of pointing them out.

    Protip: https://en.wiktionary.org/wiki/then https://en.wiktionary.org/wiki/than
    (sorry I can't post a better link, but it gets flagged as spam)
  • BenSkywalker - Thursday, January 25, 2018 - link

    Damn, dude. You sure like pontificating, for someone who's obviously not a practitioner. Several errors in your statements/thinking


    Why not take the time in pointing out the 'errors'? Should be interesting. How much positive revenue has your input on this market generated over the last few years? You are saying my market analysis makes it clear I'm not a practitioner, just want to set some clear guidelines for what matters in this conversation.
  • mode_13h - Friday, January 26, 2018 - link

    > CUDA is already x86 in AI

    HPC? yes. AI is different in that people use deep learning frameworks. While Nvidia succeeded in getting popular frameworks optimized for their GPUs, first, they do not dominate with their proprietary frameworks, and most vendors have optimized backends for popular frameworks like Caffe, TensorFlow, etc. or at least support their models.

    > Volta parts are 'learning' how to run faster on Volta chips running code built for Volta.

    It's not the parts that are learning - it's the developers. But I assume you're just employing a rhetorical flourish, here. Not advisable, as less knowledgeable will take it literally.

    > get some level of optimization hammered out years before their product actually hits

    You overestimate how much code there is that needs to be optimized. Just take a look at the cuDNN API - it's not huge. The latest version has just a single header file < 2k lines.

    The reason so many players think they can compete in this space is that the core of deep learning tech is really rather simple. Of course, there's an ever-increasing number of details between that and a usable and competitive product, but optimistic entrepreneurs and investors are clearly drawn to the simplicity at the heart of it.

    > AMD is pretty much DoA here.

    Vega is competitive against the P100, for training, and has even more strength in inferencing. The problem for them is the V100, but it could be answerable if they build something analogous to tensor cores. That they're planning a 7 nm machine learning processor would give them a chance to deliver a chip even better focused on AI than V100 (which also targets HPC and graphics applications). The big question is what competition it'll have, by the time it launches.
  • BenSkywalker - Friday, January 26, 2018 - link

    You seem to be focusing almost entirely on the semantics side, not the actual implications the statements were making.

    CUDA is already x86- Going back to 1988 you wouldn't have the situation with CUDA comparable to where x86 was? HPC is looking more like 1999- that fight is over. I cited a year I thought the overall markets were comparable.

    Saying it's not the parts that are learning- it was shorthand for what real world implications there are.

    AMD is DoA in this market, that's not semantics(this is the one point I would very strongly argue that you took issue with), that's every logical conclusion.

    https://instinct.radeon.com/en/product/mi/radeon-i...

    Seven months now, where is it? If you were a project manager and *this* is the company you were building our mission critical platform around- I would fire you in an instant. I can assure you I'm not alone. They are simply too inept to trust to something that important as they have repeatedly and loudly demonstrated.

    Vega's hype train was 'Poor Volta'. The MI 25, if it actually shipped, was supposed to be one quarter the speed of Volta with a higher power draw- and they couldn't even manage to ship that. Not only are they, on a technology basis, significantly behind- even in a hypothetical situation where they did manage to release a competitive part- who is going to risk their jobs buying into what has been a long running history of overhyped BS?

    Intel and IBM if you bet 'wrong' with them, you can make a strong case for yourself, obviously nV is the prohibitive favorite so that is a safe bet. If you are throwing down millions on a platform that is much slower, more power hungry and, what is likely by their biggest issue- nine months late coming on line? I think only a fool of a PM would even consider AMD, and that was if they were actually remotely competitive(which we know they aren't).
  • mode_13h - Saturday, January 27, 2018 - link

    If you're talking about where AMD is today, then I agree that they're not a significant player. My original comment stated that they're well-positioned to deliver deep learning solutions vs. the ASIC players, due to the inherent flexibility and scalability of GPUs.

    In point of fact, their software situation has been a major hurdle for them. ROCm was the right thing to do, but a couple years too late. Had Vega arrived when it was originally rumored, this would've been disastrous. As it is, it kept us from considering Polaris for a more cost-sensitive application.

    I think AMD's CPU division has demonstrated an ability to execute, and I believe what we're seeing are the winds of change now sweeping through their GPU division. I think their execution on Vega shows the lingering effects of years worth of budget and staff cuts. We'll see if they can right that ship, but I believe in Lisa Su.
  • edzieba - Thursday, January 25, 2018 - link

    Why hello there Fermat.
  • tuxRoller - Wednesday, January 24, 2018 - link

    Yup! That's been my impression for awhile now. For whatever reason people just couldn't shake the idea that anything could be better than a gpu.
    They remain the best solution that is easily available but they have far too much complexity that just does nothing for these problems.
  • TheJian - Tuesday, January 23, 2018 - link

    We can blame management for repeatedly wasting R&D on things that don't max profit margins! Almost all of the money is made above $200 (cpu or gpu). Designing for low margin crap is dumb and should only come later or as a way to use bad chips etc.
  • RandSec - Tuesday, January 23, 2018 - link

    It is easy to sniff at Semicustom margins, but remember that the customers have paid for almost all of the engineering in those chips (albeit based on existing AMD IP). Since AMD has little invested, they get a good Return On Investment from Semicustom, even with low production margins.

    In the future, the expansion of chiplet-based Multi-Chip Module products could lead to a renaissance in the Semicustom business, since customers may not need to design an entire huge chip on their own.
  • mode_13h - Tuesday, January 23, 2018 - link

    It seems like they also get the benefit of very close collaboration with MS and Sony, which no doubt guides and influences their hardware architecture decisions. I remember reading about features of the PS4 and XBox One GPUs that were specifically requested, which would later be reflected in their mainstream GPUs.
  • TheJian - Tuesday, January 23, 2018 - link

    Consider me sniffing and puking on those margins. If you aren't MAXIMIZING profits from your product you're wasting your engineering. PERIOD.

    See AMD went low on gpu - NV went high (1080/1070 etc). NV didn't even bother with the low end until high end slows (and it really hasn't...LOL, they just had time to make cheaper stuff now). Never produce for the poor first, you will FAIL. You might be able to find the odd one out that breaks that rule, but as a rule if you aren't targeting people with fat wallets first, you've already made a mistake.

    See Q reports for both sides for ages (check Intel too). Dirk said it when they fired him. PRODUCE A KING FIRST, or die. Funny they tried to follow his advice 5yrs too late, after firing him for saying it...ROFL. APU/custom was a complete waste of R&D that could have kept them in both the CPU race for real, and not keep getting killed on watts/heat/bad launches etc on the gpu side. Even the cpu side sucked at launch for brand new ryzen chips (should have waited for board crap/mem stuff to work out etc). A few months more baking ryzen boards would have changed reviews from "well...maybe for some people" to something more like "pretty darn great". Review after review mentioned the problems. You usually only get one chance at a first impression ;)

    AMD's dumb pricing is another issue. Price for max you can get or you're fired (especially when you're the smaller guy - make hay...). You are not in business to be our friend...LOL. If you produce a winner people buy it even if it stings.
  • Manch - Tuesday, January 23, 2018 - link

    I think that the semicustom is actual great for RTG. They have their chips in both the Sony & MS platforms and the way consoles are going we will just see more versions of the "same" platform ie pro, X. I dont see either jumping ship to Intel or Nvidia. Certainly not MS.
  • mode_13h - Tuesday, January 23, 2018 - link

    Why "Certainly not MS"? Is there bad blood between them and Nvidia or Intel?
  • Makaveli - Tuesday, January 23, 2018 - link

    I believe MS and Nvidia had some beef over the first xbox back in the day.
  • Manch - Wednesday, January 24, 2018 - link

    Yeah, MS wanted to continue to make the XBOX but bc they used off the shelf parts from NVidia and Intel, both companies refused to keep making them. So MS decided that its next console would use a semi custom design that they "own"
  • mikato - Sunday, January 28, 2018 - link

    Ha, they stopped making the parts that MS was buying and putting into the XBoxes. That’s rich. Way to alienate a customer, one that could be a good long term customer. AMD reaps the benefits by catering to the customer, and is a better fit with their APUs to begin with.
  • RandSec - Tuesday, January 23, 2018 - link

    Consoles are single-chip for lowest cost. Intel does not have competitive graphics technology and Nvidia does not have an x86 processor. AMD does.
  • HStewart - Tuesday, January 23, 2018 - link

    But the Xbox 360 was not x86 based - it does not matter - it whom ever gives Microsoft the cheapest quality.
  • RandSec - Tuesday, January 23, 2018 - link

    A fundamental equipment change starts a new gaming generation, and at that time there are few games for the new equipment to run and so few reasons for gamers to buy that equipment.

    At the start of a new gaming generation, there also is no installed base, so not many game sales are possible or expected. That means there is little urgency to write games for the new equipment, so the situation is slow to turn around.

    All of which means there is considerable interest from both hardware and software producers in extending the current generation with compatible designs. That means x86.
  • Manch - Wednesday, January 24, 2018 - link

    The 360 had a custom Power PC and an ATI GPU.

    Ease of porting between the current gen and PC's (Win 10 based) lowers costs. Backwards compatibility which requires a good bit of work for MS, is lessened by keeping things x86/direct X based. I doubt we will see consoles diverge from the x86 platform again. portables are a different story....for now.
  • mode_13h - Tuesday, January 23, 2018 - link

    I sure hope they learned from Vega that power efficiency has to be their #1, #2, and #3 priorities. I wonder if Lisa has been giving Navi the same TLC she gave to Zen.

    Post-Navi, I think we're due a ground-up redesign. GCN was a step forward, but it's getting a bit long in the tooth. It had some simplicity and elegance in the same ways as Bulldozer, but I'm doubtful of how much more tuning potential it has, and they can't get where they need to go by merely tweaking around the margins and reaping the benefits of process improvements.
  • KenLuskin - Tuesday, January 23, 2018 - link

    1) A mistake by author who writes: "AMD is bringing in someone from outside the industry altogether". This is NOT true! Per Mike Rayfield's Linkedin page: 2005-2012 General Manager, Mobile Business Unit, Nvidia
    Rayfield is NOT from outside the GPU industry at all. He worked for AMD's direct competitor for 7 years.
    2) Semi-custom is still an important area for AMD, because the GPU deal with Intel was structured as a Semi-custom deal.
    GPUs are the key differentiating IP that AMD will use to take away massive amounts of market share from Intel over the next few years

  • Ryan Smith - Tuesday, January 23, 2018 - link

    Rayfield may have worked at NVIDIA, but his job there was overseeing the SoC business, not the GPU business. For the purposes of both GPU architectural development and selling discrete GPUs, he's very much an outsider. Which in this case is likely a good thing for AMD.
  • HStewart - Tuesday, January 23, 2018 - link

    "AMD Reassembles the Radeon Technologies Group: New Leadership Hired"

    Also call re-organization.
  • ET - Wednesday, January 24, 2018 - link

    The Intel-AMD CPU's are a semi-custom graphics only solution, and I think that's one reason semi-custom is being folded into RTG: there is no semi-custom that doesn't include a GPU element (far as I know) but there is semi-custom that doesn't include a CPU element.
  • mikato - Sunday, January 28, 2018 - link

    That’s how I interpret it too. It isn’t consoles-only basically as the article said.
  • Pork@III - Wednesday, January 24, 2018 - link

    Off, how many engineers and various "smart" people wiped their legs to play for different teams. How many positive articles were written in different media and comments in forums. However, we are now the end of the second decade of the 21st century and there is no real progress, there is no technology revolution that will lead to increased hardware performance, with hundreds or thousands of percent within just one generation of processors and / or video cards.
  • Hurr Durr - Wednesday, January 24, 2018 - link

    Yeah, because a magnitude or two of performance jump is just so easy to arcieve in a mature technology.
  • Pork@III - Wednesday, January 24, 2018 - link

    And why not be easy? For example, the two-fold reduction in linear transistor size when changing the 14 to 7-nm process would theoretically reduce geometry dimensions four times. At least in theory it is. Reducing linearly twice decreases the occupied area of ​​an element by 4 times. This would allow the collection of four times more transistors, with the same area of ​​the kernel, relative to their number if they were made with the older process. Of course, these are purely theoretical considerations, only the dimensions of the transistor itself are actually doubled, but not the spacing between the transistors, their density unfortunately is not four times higher. It takes a lot of work not only to reduce the size of the transistors, but also to reduce the spacing between them, and the successes of the technologists are obviously negligible. Dimensions of gate pitch and fin pitch are the real problems in core.
  • Hurr Durr - Wednesday, January 24, 2018 - link

    I`m sure they are ready and waiting for you at every fab there is to implement this simple reduction.
  • Pork@III - Wednesday, January 24, 2018 - link

    It is not right for them to take care of justifying their salaries instead of waiting for another person to do their job?
  • jjj - Wednesday, January 24, 2018 - link

    Interesting fact, neural networks inspired the creating of Synaptics so would be interesting to ask David Wang how he views machine learning after his years at Synaptics.
  • jjj - Wednesday, January 24, 2018 - link

    ugh my comment wasn't intended as a reply
  • edzieba - Thursday, January 25, 2018 - link

    "For example, the two-fold reduction in linear transistor size when changing the 14 to 7-nm process would theoretically reduce geometry dimensions four times. At least in theory it is. Reducing linearly twice decreases the occupied area of ​​an element by 4 times."

    Don't use marketing terms for engineering estimations.
  • mode_13h - Friday, January 26, 2018 - link

    Not only that, it's like he doesn't even know about leakage.
  • Pork@III - Friday, January 26, 2018 - link

    Ok 7nm is not exactly 7nm but 14nm is not exactly 14nm :D
    However after ordinary 7nm leads 7+ and 7++ processes. With density growth than its predecessors. What you say for those "twins"?

Log in

Don't have an account? Sign up now