My first question is regarding ARM's case for synchronous clock and voltage domains as an actual advantage over asynchronous ones. ARM seems to be definitely convinced that going synchronous is the right choice, but except for one whitepaper which Samsung has released regarding bit.LITTLE some time ago (http://mobile.arm.com/files/downloads/Benefits_of_... there has never been any in-depth technical justification for the matter.
How far has ARM actually researched into the matter to come to this conclusion? What are the ramifications against it? Not only on a architectural basis, but also on a platform basis (PMIC voltage rail power conversion overhead increase for example)?
Current power management mechanisms seem extremely "dumb" in the sense that they almost never are able to control DVFS and Idle states in such a manner to actually make full efficient use of the hardware in terms of power efficiency and performance. Current Linux kernel discussions are (finally) trying to merge these mechanisms into the scheduler to improve their functioning. My understanding that the upcoming A53/A57 have also more advanced retention states over the current generation of cores to allow for better use of idle states, much like the A15 generation of SoCs got rid of legacy hot-plugging of cores in favor of power-gating C-states.
My second question is what the reasoning is that hardware PPMU controllers which would be in charge of DVFS are basically inexistent in the current mobile ARM space? ARM's reasoning until now is that a hardware controller is not able to have a full "overview" of system load to base its decisions on, but that doesn't explain why a software-policy based hardware controller could not achieve this, as it would have advantages of both worlds with none of the disadvantages. We achieve much finer granularity, order of magnitudes greater than current software based ones are currently running on. Any comments on the matter?
Some of what you are asking about are implementation choices that our silicon partners would make which I cannot really comment on. But roughly speaking for cores in a MP cluster we would expect these to run these at the same voltage, and synchronous frequencies, and communication with the L2 cache would be synchronously as well (L2 typically run at 1/2 speed of L1). The interface between the CPU and the bus-fabric is often asynchronous. Different cores within the cluster can be power-gated, but the primary supplies (sometimes different supplies for RAM and logic) would be the same.
The following document states some of these assumptions:
"The CPU clusters are connected to the cache coherent interconnect through an asynchronous bridge to enable each CPU cluster to be scaled independently in frequency and voltage. The independent DVFS operation of the CPU clusters allows the system to more closely track the performance requirements in loads where there is a mix of high-performance and background activity."
As for controlling DVFS state, some of our system designs use a dedicated Cortex-M3 processor in the system to manage the various power states. The reason for the dedicated processor is that there is much more going on in a SoC than just activity on the application processors and it's easier to observe this activity from different positions on the SoC. And you'd like to be able to make power-related decisions without having to wake up the big CPUs all the time - let them rest as much as they can. M-class processors have much lower-latency and consume less power, so they are a natural place in the system for making power-related decisions.
The computer architecture research community seems to increasingly lean towards hardware accelerators and heterogeneity (not just big.LITTLE and GPGPU, but all sorts of mixes of architectural and microarchitectural techniques) for increasing energy efficiency/performance as Moore's law and Dennard scaling don't seem to work that well anymore.
My question is...does ARM R&D see some future for widely applicable accelerator-rich architectures, or do they deem it to be a passing trend? If they are here to stay, there will probably quite a few software-related issues that will follow and I'm quite curious as to how ARM R&D thinks these may be addressed.
It’s not a passing fad: SoCs have been accelerating various functions for many decades now. Do I think that this will become more central to our strategy? No. There are a number of reasons for this. The functions initially accelerated by external hardware often end up being incorporated into the architecture over time. This usually makes their use easier and provide a stabler software target. Coordinating the activities of a large number of accelerators is as hard a problem as the general parallelization problem; you can have some good application-specific solutions but very difficult to make broadly applicable…. And there is always Amdahl’s law to worry about - it applies to accelerators as well.
I do see a role for increasing heterogeneity in system but this is about how we best cater to supporting different styles of parallelism and activity levels in different applications rather than speeding up the individual compute kernels.
What do you see coming down the road from ARM that is relevant to the DIY Maker and the Internet of Things? I'm intrigued by the forthcoming Arduino Zero and am an avid user of linux boards like the Raspberry Pi and BeagleBone black for my robotics projects. How do you see ARM being able to improve this space in the future?
What we can do is reduce the barrier of entry for the "long tail” of developers who are looking to build prototypes and then take them to product. Availability of hardware and modules is one important factor but equally important is the software / toolchain story around them. We have some exciting plans for mbed (check out mbed.org) that we’ll be talking about later this year.
"Availability of hardware and modules is one important factor " thats good if the IP people really want is there, mbed.org seem interesting if you can cater for the novice as well as the semi-pro.
for instance none of the curent vendors produce what many people want, and looking in the long tail to design/make/perhaps produce in 10's/100's of PCB etc then im optimistic in my view that most interested people will want access to USB3.1, PCI-e3/4 dual/quad DRAM controllers compatible tested IP for the likes of the 3.2+ GBytes/second NVR Everspin ST-MRAM DDR3 modules http://www.everspin.com/PDF/ST-MRAM_Presentation.p... and even mbed.org slices http://www.cnx-software.com/2013/12/29/xmos-xcore-...
The fabrication companies are having much difficulty in achieving high enough yield to mass produce 20nm processors. Yield problems will increase as companies attempt to shrink to even smaller processes and adopt FINFETS. How concerned are you that we are near the end of the road of continual CMOS process shrinks for increased transistor counts and performance improvements? How confident are you that for the next 5 years or so that we will continue to see transistor level device improvements that can be implemented into future processors?
Don’t forget that Moore’s Law is an economic law and the primary reason it will stop is because of economics (not because of physics). If a manufacturer doesn’t get enough bang for his buck, they will get off the treadmill. My team has been working on predictive technology models to understand the issues with future generations of processes: there are conceivable combinations of technologies to create new process nodes for the next decade or more. The real question is whether it is actually worth doing so as the improvements may not be commensurate with the expense.
Will clock-less ARM cores like the "Amulet" and "ARM996HS" become mainstream? will we hear more about them? do they have a significant power savings advantage for mobile and other low power applications?
No. We’ve been involved in many asynchronous implementation projects but none of the potential benefits ultimately panned out. This is not necessarily the fault the people involved. The ideas were interesting but as they weren’t mainstream, there wasn’t an ecosystem there to support them. The implementers usually had to make do with poorer quality tools and cores that were originally designed for synchronous operation. So it’s one of these things: the ideas may have been sound in theory but it takes much more than that to build a successful product that delivers on the promises.
My questions are about big.LITTLE that ARM has been championing for the last few years:
First, Given the early issues with big.LITTLE designs (infamously Exynos 5410), what will ARM do to prevent problems like this in the future? Now that A15/A7 issues seem to have been ironed out, might similar issues arise with A57/A53 in the future?
Second, how much optimization does big.LITTLE require on the kernel/OS side to get best efficiency/performance results? Do you currently work with Google, Microsoft, etc to integrate this capability into their stock kernel?
Regarding your first question, we can't comment on specific partner SoCs or their implementations, so any questions on Exynos 5410 should be directed to Samsung.
What we can say is that big.LITTLE and coherency are fully supported in the ARM CPU and system IP. In fact, there are now several shipping examples of big.LITTLE MP (global task scheduling) in the market, and expect big.LITTLE SoCs to be predominantly big.LITTLE MP enabled from this point forward.
There are specific enhancements in the latest ARM CPUs that improve CPU performance, introduce 64-bit support, and will additionally improve big.LITTLE performance in the following ways: + Even faster coherency transactions + A higher performance LITTLE core (Cortex-A53) means the big cores (Cortex-A57) can remain asleep for a larger percentage of workloads
big.LITTLE software continues to evolve as well, in the optimizations and tuning of big.LITTLE SOCs, and in coordination with other power management frameworks and other system components. Expect more advances in the software to come as big.LITTLE continues to improve over time.
Regarding your second question, perhaps the best way to answer is to point you to a blog that my colleague Brian Jeff, posted last year entitled “Ten things to know about big.LITTLE” http://community.arm.com/groups/processors/blog/20... In this blog, Brian talks about how big.LITTLE software effectively operates underneath Android in the kernel and you don't have to directly change Android middleware. The Linux kernel is patched with code that handles the big.LITTLE thread scheduling. ARM keeps its ecosystem partners updated on big.LITTLE software techniques and approaches, but for comments on a specific OS, you'll need to ask the OS vendor directly.
The first time I heard about A15 cores they were designed for microservers, the first time I heared about about A57 cores they were targeting microservers (with what looks like better luck this time). Both ended or will end up in mobile devices. Two questions arise regarding this: Does ARM have a plan to develop some bigger core with higher IPC that can compete with Intel cores? (Apple Cyclone comes to mind) In case it´s a yes: Will you ensure those cores can be used later on mobile devices? Can you give us some insight of how? I guess it´s quite challenging to design both for ultra performance and low power consumption at the same time. What´s your view in wereables? I see them as a huge promise, but created of lot of niches, which makes it difficult to monetarize. Are you planning anything ultra low power for them? One last question. I have the feeling Moores law is coming to an end, seeing that the 20nm node is presenting serious challenges and cost per transistor is rising. What´s the next step one the big up slope of reducing nodes is reached? What technology do you think will substitute sylicon? Is there anything on the pipe already?
They could do what Apple did, and build some cores that will only go in 2-core set-up in mobile, and at 1-1.2 Ghz speeds. Then take those cores and put them at 2-2.5 Ghz in PCs. However, I don't think this is ARM's plan. As we've seen with Apple's Cyclone, it's not among the most efficient. It will be interesting to see Nvidia's Denver cores which aren't just huge, but also clocked at 2.5-3 Ghz.
But as I mentioned in another comment at the end of the comments page, ARM doesn't need to do this. They can just wait it out, and will eventually reach parity with Intel in performance anyway (since Intel doesn't care about improving performance as much with the latest generations).
Is there a future for a desktop ARM chip (non-SOC)? There had been a rumor of Apple moving iOS to the desktop and I could see Android making a similar move. It sounds like Project Denver and the move towards 64-bit are in line with this kind of transition.
yes i think he is, but my thinking has aways been longer term, given that cortex is destined for hyperscale eventually the efforts so far are odd in their choices to advance ARM everywhere.
while the Cavium ThunderX Server SoC Features up to 48 ARM 64-bit Cores
if so id really like ARM to get behind one of the existing SoDIMM module formats, or better yet, use your long tail information to create a brand new free generic ARM(64) SoDIMM module SoM industry standard with the base carrier/daughter board (multi 4+) SoM enabled, something far better than the existing limited scope barely good enough standards today see http://www.cnx-software.com/2014/05/24/aaeon-annou... for the state (it is) of the art yack :) and do far better with an eye to create 100+ ARM SoM on a generic sled and perfectly happy to take one of those server SoM and put it in a small ARM STB box if you like, cross compatibility at the core no real sub dividing in to pro segments as such, dont throw it away recycle it if you will....
That depends on what you mean by "big deal"? :) The ISA matters because it's a contract between a software ecosystem and silicon vendors. There are many ways of burdening an ISA with features that make one or both parties' lives difficult. To avoid this, ARM runs an Architecture Review Board which consists of both internal and external members. We keep on finding, that simplicity matters and deprecating features is as important as adding new ones.
This is exactly why you should be trying to deprecate ARMv7 ISA as soon as possible! So then you can only build pure ARMv8 chips, without the baggage of ARMv7. Instead, you keep building cores like Cortex A12 and A17. You need to be committed to ARMv8, and forget ARMv7.
One advantage of Intel architectures compared with ARM for scientific computing is higher precision and range floating point using IEEE 754-recommended extended precision. Floating point computing often requires internal computations at a higher precision and range than double to cater for ill-conditioned data and unstable algorithms e.g. tiny values in probability models in my work. Are there any plans to improve floating point precision in ARM cores ,e.g. with IEEE 128-bit binary floating point, to cater for scientific computing in the future?
As far as I know Intel supports extended precision in the legacy sense but they don’t necessarily make it fast. At ARM, we don’t get requests for higher precision than what we already offer, even from people interested in building supercomputers. Also, it's important to note that ARM supports subnormal floating point operation in hardware, which is typically faster than what you find in most competitors’ chips.
Very simple question. Now that ARM is on track for 64-bit, IMHO the next obvious requirement (in terms of visible API) is hardware TM. Can you let us know ARM's plans for this? Do you already have the instructions and their semantics defined?
ARM is interested in many new technologies such as hardware support for TM but wants them to have a proven benefit commensurate with its cost to implement for ARM's target markets before we deploy the technology. For ARM's markets, the benefit of the technology is still to be proven, but we remain interested in the technology.
by "hardware TM" i assume he means Hardware transactional memory (controlling access to shared memory in concurrent computing., not the david may way:) not TM Hardware as in the ugly variety of door seals, closers on their site :)
"ARM is interested in many new technologies " good i just noticed "UK: Plastic Logic" did some new Flexible OLED Screens and finally a partner program ,perhaps ARM can nip down the road and give them a leg up by doing a controller Ip for them and even help them with their questionable advertising :) http://the-digital-reader.com/2014/06/03/plasticlo...
perhaps even get a selection/few pro bono Plastic Logic kit and make a slice to use it and give away as a prize or some such, the guy need help getting the word out it seems for a long time now, im a fan, but its taking far to long for them to provide super cheap consumer plastic displays for even DIY projects...
You hear a lot about how other companies are trying to replicate ARM's success in the power-efficient arena. Is the reverse true? That is, does ARM have any interest in broadening their product offerings to include processors of comparable power to desktop PCs?
I would argue that everyone in the processor industry is now on the efficiency bandwagon. It's just that ARM has evolved under that efficiency pressure - due to its focus on the mobile industry - for longer than others. What differs between products is the market and the power budget available. Having a larger power budget doesn't mean that one can afford inefficient designs: the goal is still to get as much performance out of a given power budget as possible. If you take a look at our high-end cores, they are already desktop-class. However, the core isn't everything, our silicon partners need to see the business case for building desktop-oriented processors (with appropriate amount of in-chip caches, memory system, etc.) ... you can see this starting to happen in the server market now.
What security measures does ARM have against industrial espionage and the now infamous NSA meddling with firmwares, software and anything computing related? Have ARM HQ been infiltrated? Are ARM designs safe, or contain some hardware "backdoors" for i.e. elevating rights (userspace to kernel, TrustZone, etc.) ? Are sources of ARM processors open to security audits? Does ARM do security audit on its own? Are sources versioned in a way that would detect "hacks" - changes(backdoors) to the sources ?
And vice versa, does ARM have access to partners' IP that means third party ARM processors sources, or modified commodity ARM designs? To review them for security purposes?
These are all questions I've always wished to ask.
Would You state Your opinion about evolution of x86 ARM embedded devices? What is Your analysis about that? Is this showing that in future at long term, ARM won't be able to compete with x86 in terms of response to market needs? Or not, and embedding ARM in x86 based processors is a competiton advantage solely?
Hello! Compared to the great tech-savvy questions posed here, this may be a little too... mainstream to warrant your attention, but I'd love to hear your thoughts. Thanks!
About four years ago, Intel's attempts to cut into the consumer SoC market amounted to a big flop, but sometime around the release of the Razr i, they seemed to be getting their act together. More recently, the Intel Celeron 2955U is arguably the best choice for Chromebooks when considering the dual and quad Exynos alternatives. As we've seen with Haswell, it's clear that Intel is really focused on improving their mobile processors as consumers gravitate towards mobile products and ARM eats up marketshare. Assuming Intel and AMD continue to prioritize low voltage/thermally constrained consumer devices, will ARM eventually yield to Intel/AMD in the mobile OS space and focus on embedded (boring) systems? I think ARM SoCs are still clearly the best choice for smartphones, but in about four years time, Intel's ability to compete in the mobile OS space has gone from "complete joke" to "somewhat viable." The Cortex-A50 series sounds great and I can't wait, and I know that nobody can predict the future, but what's your best guess for five, ten years down the line?
We don't have any intentions on giving up on the mobile market. ;) What you see playing out with competitive offerings is not really about technology but about business models: ARM is a much lower-cost ecosystem-based play, not an old-school high-margin vertical. I've placed my bets on which one I think will win long term. ;)
Lately we've seen CPUs and GPUs looked at for different general compute tasks - CPUs for latency-sensitive, complex, and/or single-threaded calculations and GPUs for throughput-based, shallow, highly-threaded calculations. Is there a third type of processor with a different balance of deep vs. wide, or perhaps one going in a different direction altogether, that you see becoming more relevant in the future?
There are some interesting new workloads that are becoming relevant and some of these may warrant specific architectural support and/or new microarchitectures. An area I find interesting from an architecture point-of-view is machine learning. But we'll need some time for the algorithms to settle before we'd design in too much deep support for new workloads.
Are you talking about adding neurosynaptic cores alongside ARMv8 cores? Sort of a right-brain to go with the current left-brain? I know IBM Research has talked of going in that direction as well.
Whatever happened to the ARM BIOS and/or other I/O standardization efforts? ("One Linux for all ARM systems") Will we see BIOS of some sorts, standard peripherals to rely boot on, or standard SATA controllers ? I think it's mandatory for ARM to grow into PC/server space to have standardized components besides instruction set (and with A5x 64bit ARM cores and up I see the best outcome in the cleanup of an ISA, but what about peripherals? Timers, interrupts, something for one kernel to rule them all?) To make Linus happy? ;) Some auto-discovery of peripherals akin to PCI,USB ? It wouldn't eat so many high-frequency transistors and could speed the development of kernel many times, IMO.
Writing from RPi, best regards and thanks for the answers and your opinion.
Check out linaro.org . This is an organization we've set up to cater to the needs of the ARM-based Linux ecosystem. I think they've taken many of the right steps to make Linus happier.... As far as standardization, we have also been working on various platform design documents that help our partners deploy functionality in a common way to reduce unnecessary fragmentation. A great example of this is the Server Base System Architecture (SBSA) specification we collaborated on with our silicon and software partners.
Not sure either of these cause me sleepless nights but they do give my subconscious some story lines to work on when asleep. ;) I don't think competition is a bad thing ...
Do you think there might be a comeback for ThumbEE type facilities in the 64 bit architecture to support JIT code or run time checking of things like overflows?
It seems like everything is being virtualized, does this stop some type of developments that are incompatible with good virtualized performance?
What is the main thing you know now that you really wish you knew or fully understood five years ago?
a) We do look at JIT performance as one of the metrics we optimize for. But we often find that it's better to cater to JITs at the microarchitecture level rather than at the architecture (instruction set). We've found that there are way too many different approaches to writing good JITs, and there are conflicts on when and how much the well-intentioned JIT-oriented instructions can be exploited.
b) I am not aware of any architectural development that we didn't do because of virtualization issues. However, I agree with you, there are ways of catering to virtualizability and others that make it harder. Architects do think about these issues.
c) It's a long list... But not much of it is related to computers. ;) Not because I claim to have known everything about the subject - but the objectives have been clear: how do you increase the efficiency of your designs (Performance / W) while keeping the costs down. It's a tall order and the means of achieving it change over time. But this metric has been front-and-center at ARM over much of its existence and I don't expect this to change radically.
Hrm, chances are I'm too late, but an oddball question:
OS X, Android, and Chrome OS all use some kind of compressed swap, running a cheap algorithm like LZO or Snappy on not-recently-used pages of RAM--it's a neat trick to pretend you've got more memory than you really do, and it works well because CPUs are so stupid-fast now.
Early slides for AMD's Seattle SoC advertise a dedicated compression coprocessor. I don't know how much dedicated hardware even helps with compression (from what little I know of the algorithms, it seems like you'd mainly need fast random access to a smallish memory cache, which the CPU should already be really good at). But if you could make some sort of compression "free", that could boost devices' effective RAM for the buck (or improve effective I/O speed for the buck, as the SandForce SSD controller's compression does).
So, I suppose you can't really talk plans, but is dedicated compression IP an interesting area?
Intel covers its markets with variants of essentially two cores (Atom-class cores and Core-class cores) while ARM already has many more (at least 3 current A-series cores, discounting Cortex-M and Cortex-R cores). Is ARM comfortable with this strategy or is there a need for more specialized cores that address only a certain market? (E. g. does it make sense to bring a dedicated server core to market which is no longer suitable for smartphone and tablet applications?)
Which cores get used for which market is ultimately up to our silicon partners. What ARM does is make sure is that we offer all the features that they need to take ARM cores into the appropriate segments.
Sure, but it seems to me that there is a burgeoning ARM server market, demand seems to be there. Since the requirements for server cores are different from those you have in mobile applications, I was wondering how ARM will address that market. (Apart from offering custom cores for that market, bringing suitable interconnects to market could be another approach.)
When will we see a high performance single thread CPU and powerful GPU in ARM?
4/8 cores do not make sense in mobile platform since they only throttles a lot and makes the transition animation stutters or janky on Android due to the low performance when throttling.
What do you mean? Cortex A57 not powerful enough for mobile? Or do you mean in PC's?
ARM doesn't need to target PCs with a higher performance/higher power consumption CPU and GPU. They will get there in a few years. "There" being around the same level of performance with Intel's mainstream Core chips. Why? Because Intel will forego chasing performance because:
1) it's getting increasingly more expensive for them to do so
2) chasing lower power consumption is much easier at this point, and also a pretty marketable feature.
Plus ARM CPUs are increasing in performance with each generation much more than Intel's CPUs do. So in a few years we'll see parity. ARM should be more concerned with Performance/Watt, since that's really where they will have to worry about Intel, since Intel is a node and a half ahead of ARM chips right now (although this gap will shrink with the arrival of FinFET, but still).
"An area I find interesting from an architecture point-of-view is machine learning. But we'll need some time for the algorithms to settle before we'd design in too much deep support for new workloads."
Beyond SIMD for matrix routines, what types of new architecture level support do you envision helping machine learning workloads?
Does the increase in single precision NEON performance in the Cortex a57 (vs a15) come from just a clock speed increase? Can you talk about any enhancements that has been made to the FPU and NEON?
Is there any interest by ARM of targeting the HPC market? Or more specifically, does ARM want to be in the top500 list with ARM cores alone (i.e without third parties accelerator, something like Intel Xeon Phi)?
Apple Cyclone (A7) hard-launched in September of last year. ARM's A57 won't hard launch until the middle of next year (close to 18 months later).
How does the first-party ARM design team fall so far behind when foreknowledge of the new ISA gives them a head start in micro-architecture design and implementation? What is ARM R&D doing to catch up with their licensees?
Back to the question about the new 64-bit ISA. I was surprised to leard that the ARMv8 instructions translate into microops. Was this done to cater for instructions that need to be broken up into pieces? Were there other reasons for doing so? Thanks.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
62 Comments
Back to Article
AndreiLux - Tuesday, June 3, 2014 - link
Hello, two main (related) questions here:My first question is regarding ARM's case for synchronous clock and voltage domains as an actual advantage over asynchronous ones. ARM seems to be definitely convinced that going synchronous is the right choice, but except for one whitepaper which Samsung has released regarding bit.LITTLE some time ago (http://mobile.arm.com/files/downloads/Benefits_of_... there has never been any in-depth technical justification for the matter.
How far has ARM actually researched into the matter to come to this conclusion? What are the ramifications against it? Not only on a architectural basis, but also on a platform basis (PMIC voltage rail power conversion overhead increase for example)?
Current power management mechanisms seem extremely "dumb" in the sense that they almost never are able to control DVFS and Idle states in such a manner to actually make full efficient use of the hardware in terms of power efficiency and performance. Current Linux kernel discussions are (finally) trying to merge these mechanisms into the scheduler to improve their functioning. My understanding that the upcoming A53/A57 have also more advanced retention states over the current generation of cores to allow for better use of idle states, much like the A15 generation of SoCs got rid of legacy hot-plugging of cores in favor of power-gating C-states.
My second question is what the reasoning is that hardware PPMU controllers which would be in charge of DVFS are basically inexistent in the current mobile ARM space? ARM's reasoning until now is that a hardware controller is not able to have a full "overview" of system load to base its decisions on, but that doesn't explain why a software-policy based hardware controller could not achieve this, as it would have advantages of both worlds with none of the disadvantages. We achieve much finer granularity, order of magnitudes greater than current software based ones are currently running on. Any comments on the matter?
KFlautner - Thursday, June 5, 2014 - link
Some of what you are asking about are implementation choices that our silicon partners would make which I cannot really comment on. But roughly speaking for cores in a MP cluster we would expect these to run these at the same voltage, and synchronous frequencies, and communication with the L2 cache would be synchronously as well (L2 typically run at 1/2 speed of L1). The interface between the CPU and the bus-fabric is often asynchronous. Different cores within the cluster can be power-gated, but the primary supplies (sometimes different supplies for RAM and logic) would be the same.The following document states some of these assumptions:
http://www.arm.com/files/pdf/big_LITTLE_technology...
"The CPU clusters are connected to the cache coherent interconnect through an asynchronous bridge to enable each CPU cluster to be scaled independently in frequency and voltage. The independent DVFS operation of the CPU clusters allows the system to more closely track the performance requirements in loads where there is a mix of high-performance and background activity."
As for controlling DVFS state, some of our system designs use a dedicated Cortex-M3 processor in the system to manage the various power states. The reason for the dedicated processor is that there is much more going on in a SoC than just activity on the application processors and it's easier to observe this activity from different positions on the SoC. And you'd like to be able to make power-related decisions without having to wake up the big CPUs all the time - let them rest as much as they can. M-class processors have much lower-latency and consume less power, so they are a natural place in the system for making power-related decisions.
maltanar - Tuesday, June 3, 2014 - link
The computer architecture research community seems to increasingly lean towards hardware accelerators and heterogeneity (not just big.LITTLE and GPGPU, but all sorts of mixes of architectural and microarchitectural techniques) for increasing energy efficiency/performance as Moore's law and Dennard scaling don't seem to work that well anymore.My question is...does ARM R&D see some future for widely applicable accelerator-rich architectures, or do they deem it to be a passing trend? If they are here to stay, there will probably quite a few software-related issues that will follow and I'm quite curious as to how ARM R&D thinks these may be addressed.
KFlautner - Wednesday, June 4, 2014 - link
It’s not a passing fad: SoCs have been accelerating various functions for many decades now. Do I think that this will become more central to our strategy? No. There are a number of reasons for this. The functions initially accelerated by external hardware often end up being incorporated into the architecture over time. This usually makes their use easier and provide a stabler software target. Coordinating the activities of a large number of accelerators is as hard a problem as the general parallelization problem; you can have some good application-specific solutions but very difficult to make broadly applicable…. And there is always Amdahl’s law to worry about - it applies to accelerators as well.I do see a role for increasing heterogeneity in system but this is about how we best cater to supporting different styles of parallelism and activity levels in different applications rather than speeding up the individual compute kernels.
Ikefu - Tuesday, June 3, 2014 - link
What do you see coming down the road from ARM that is relevant to the DIY Maker and the Internet of Things? I'm intrigued by the forthcoming Arduino Zero and am an avid user of linux boards like the Raspberry Pi and BeagleBone black for my robotics projects. How do you see ARM being able to improve this space in the future?KFlautner - Wednesday, June 4, 2014 - link
What we can do is reduce the barrier of entry for the "long tail” of developers who are looking to build prototypes and then take them to product. Availability of hardware and modules is one important factor but equally important is the software / toolchain story around them. We have some exciting plans for mbed (check out mbed.org) that we’ll be talking about later this year.BMNify - Wednesday, June 4, 2014 - link
"Availability of hardware and modules is one important factor "thats good if the IP people really want is there, mbed.org seem interesting if you can cater for the novice as well as the semi-pro.
for instance none of the curent vendors produce what many people want, and looking in the long tail to design/make/perhaps produce in 10's/100's of PCB etc then im optimistic in my view that most interested people will want access to
USB3.1,
PCI-e3/4
dual/quad DRAM controllers compatible tested IP for the likes of the 3.2+ GBytes/second NVR Everspin ST-MRAM DDR3 modules
http://www.everspin.com/PDF/ST-MRAM_Presentation.p...
and even mbed.org slices http://www.cnx-software.com/2013/12/29/xmos-xcore-...
Khenglish - Tuesday, June 3, 2014 - link
The fabrication companies are having much difficulty in achieving high enough yield to mass produce 20nm processors. Yield problems will increase as companies attempt to shrink to even smaller processes and adopt FINFETS. How concerned are you that we are near the end of the road of continual CMOS process shrinks for increased transistor counts and performance improvements? How confident are you that for the next 5 years or so that we will continue to see transistor level device improvements that can be implemented into future processors?KFlautner - Wednesday, June 4, 2014 - link
Don’t forget that Moore’s Law is an economic law and the primary reason it will stop is because of economics (not because of physics). If a manufacturer doesn’t get enough bang for his buck, they will get off the treadmill. My team has been working on predictive technology models to understand the issues with future generations of processes: there are conceivable combinations of technologies to create new process nodes for the next decade or more. The real question is whether it is actually worth doing so as the improvements may not be commensurate with the expense.PeteH - Tuesday, June 3, 2014 - link
Will 2014 be the year Michigan football finally returns to the top of the B1G?KFlautner - Wednesday, June 4, 2014 - link
I’m pretty sure for the fans it never actually left the top spot ...nemi2 - Tuesday, June 3, 2014 - link
Will clock-less ARM cores like the "Amulet" and "ARM996HS" become mainstream? will we hear more about them? do they have a significant power savings advantage for mobile and other low power applications?KFlautner - Wednesday, June 4, 2014 - link
No. We’ve been involved in many asynchronous implementation projects but none of the potential benefits ultimately panned out. This is not necessarily the fault the people involved. The ideas were interesting but as they weren’t mainstream, there wasn’t an ecosystem there to support them. The implementers usually had to make do with poorer quality tools and cores that were originally designed for synchronous operation. So it’s one of these things: the ideas may have been sound in theory but it takes much more than that to build a successful product that delivers on the promises.JCP2014 - Tuesday, June 3, 2014 - link
My questions are about big.LITTLE that ARM has been championing for the last few years:First, Given the early issues with big.LITTLE designs (infamously Exynos 5410), what will ARM do to prevent problems like this in the future? Now that A15/A7 issues seem to have been ironed out, might similar issues arise with A57/A53 in the future?
Second, how much optimization does big.LITTLE require on the kernel/OS side to get best efficiency/performance results? Do you currently work with Google, Microsoft, etc to integrate this capability into their stock kernel?
KFlautner - Wednesday, June 4, 2014 - link
Regarding your first question, we can't comment on specific partner SoCs or their implementations, so any questions on Exynos 5410 should be directed to Samsung.What we can say is that big.LITTLE and coherency are fully supported in the ARM CPU and system IP. In fact, there are now several shipping examples of big.LITTLE MP (global task scheduling) in the market, and expect big.LITTLE SoCs to be predominantly big.LITTLE MP enabled from this point forward.
There are specific enhancements in the latest ARM CPUs that improve CPU performance, introduce 64-bit support, and will additionally improve big.LITTLE performance in the following ways:
+ Even faster coherency transactions
+ A higher performance LITTLE core (Cortex-A53) means the big cores (Cortex-A57) can remain asleep for a larger percentage of workloads
big.LITTLE software continues to evolve as well, in the optimizations and tuning of big.LITTLE SOCs, and in coordination with other power management frameworks and other system components.
Expect more advances in the software to come as big.LITTLE continues to improve over time.
Regarding your second question, perhaps the best way to answer is to point you to a blog that my colleague Brian Jeff, posted last year entitled “Ten things to know about big.LITTLE” http://community.arm.com/groups/processors/blog/20...
In this blog, Brian talks about how big.LITTLE software effectively operates underneath Android in the kernel and you don't have to directly change Android middleware. The Linux kernel is patched with code that handles the big.LITTLE thread scheduling.
ARM keeps its ecosystem partners updated on big.LITTLE software techniques and approaches, but for comments on a specific OS, you'll need to ask the OS vendor directly.
TylerGrunter - Tuesday, June 3, 2014 - link
The first time I heard about A15 cores they were designed for microservers, the first time I heared about about A57 cores they were targeting microservers (with what looks like better luck this time).Both ended or will end up in mobile devices. Two questions arise regarding this:
Does ARM have a plan to develop some bigger core with higher IPC that can compete with Intel cores? (Apple Cyclone comes to mind)
In case it´s a yes: Will you ensure those cores can be used later on mobile devices? Can you give us some insight of how? I guess it´s quite challenging to design both for ultra performance and low power consumption at the same time.
What´s your view in wereables? I see them as a huge promise, but created of lot of niches, which makes it difficult to monetarize. Are you planning anything ultra low power for them?
One last question. I have the feeling Moores law is coming to an end, seeing that the 20nm node is presenting serious challenges and cost per transistor is rising. What´s the next step one the big up slope of reducing nodes is reached? What technology do you think will substitute sylicon? Is there anything on the pipe already?
Krysto - Friday, June 6, 2014 - link
They could do what Apple did, and build some cores that will only go in 2-core set-up in mobile, and at 1-1.2 Ghz speeds. Then take those cores and put them at 2-2.5 Ghz in PCs. However, I don't think this is ARM's plan. As we've seen with Apple's Cyclone, it's not among the most efficient. It will be interesting to see Nvidia's Denver cores which aren't just huge, but also clocked at 2.5-3 Ghz.But as I mentioned in another comment at the end of the comments page, ARM doesn't need to do this. They can just wait it out, and will eventually reach parity with Intel in performance anyway (since Intel doesn't care about improving performance as much with the latest generations).
Wreckage - Tuesday, June 3, 2014 - link
Is there a future for a desktop ARM chip (non-SOC)? There had been a rumor of Apple moving iOS to the desktop and I could see Android making a similar move. It sounds like Project Denver and the move towards 64-bit are in line with this kind of transition.Is this something that ARM is working on?
ImSpartacus - Tuesday, June 3, 2014 - link
Are you talking about something socketed?BMNify - Thursday, June 5, 2014 - link
yes i think he is, but my thinking has aways been longer term, given that cortex is destined for hyperscale eventually the efforts so far are odd in their choices to advance ARM everywhere.while the Cavium ThunderX Server SoC Features up to 48 ARM 64-bit Cores
http://www.cnx-software.com/2014/06/04/cavium-thun...
http://www.cnx-software.com/wp-content/uploads/201...
seems a nice step up, i dont know the size of the SOC (you do probably) it seems to me that the way they all pack the ARM cores into a generic sled seems very wasteful to date plus masses of redundant steel doesn't help ether , could that ThunderX fit on a generic SoDIMM module ?.
if so id really like ARM to get behind one of the existing SoDIMM module formats, or better yet, use your long tail information to create a brand new free generic ARM(64) SoDIMM module SoM industry standard with the base carrier/daughter board (multi 4+) SoM enabled, something far better than the existing limited scope barely good enough standards today see http://www.cnx-software.com/2014/05/24/aaeon-annou... for the state (it is) of the art yack :) and do far better with an eye to create 100+ ARM SoM on a generic sled and perfectly happy to take one of those server SoM and put it in a small ARM STB box if you like, cross compatibility at the core no real sub dividing in to pro segments as such, dont throw it away recycle it if you will....
tipoo - Tuesday, June 3, 2014 - link
What's your take on the impact of ISA on the overall CPU architecture? Is it still a big deal these days with millions of transistors surrounding it?KFlautner - Thursday, June 5, 2014 - link
That depends on what you mean by "big deal"? :)The ISA matters because it's a contract between a software ecosystem and silicon vendors. There are many ways of burdening an ISA with features that make one or both parties' lives difficult. To avoid this, ARM runs an Architecture Review Board which consists of both internal and external members. We keep on finding, that simplicity matters and deprecating features is as important as adding new ones.
Krysto - Friday, June 6, 2014 - link
This is exactly why you should be trying to deprecate ARMv7 ISA as soon as possible! So then you can only build pure ARMv8 chips, without the baggage of ARMv7. Instead, you keep building cores like Cortex A12 and A17. You need to be committed to ARMv8, and forget ARMv7.ComputationalScientist - Tuesday, June 3, 2014 - link
One advantage of Intel architectures compared with ARM for scientific computing is higher precision and range floating point using IEEE 754-recommended extended precision. Floating point computing often requires internal computations at a higher precision and range than double to cater for ill-conditioned data and unstable algorithms e.g. tiny values in probability models in my work. Are there any plans to improve floating point precision in ARM cores ,e.g. with IEEE 128-bit binary floating point, to cater for scientific computing in the future?KFlautner - Wednesday, June 4, 2014 - link
As far as I know Intel supports extended precision in the legacy sense but they don’t necessarily make it fast. At ARM, we don’t get requests for higher precision than what we already offer, even from people interested in building supercomputers. Also, it's important to note that ARM supports subnormal floating point operation in hardware, which is typically faster than what you find in most competitors’ chips.name99 - Tuesday, June 3, 2014 - link
Very simple question.Now that ARM is on track for 64-bit, IMHO the next obvious requirement (in terms of visible API) is hardware TM. Can you let us know ARM's plans for this? Do you already have the instructions and their semantics defined?
KFlautner - Wednesday, June 4, 2014 - link
ARM is interested in many new technologies such as hardware support for TM but wants them to have a proven benefit commensurate with its cost to implement for ARM's target markets before we deploy the technology. For ARM's markets, the benefit of the technology is still to be proven, but we remain interested in the technology.BMNify - Thursday, June 5, 2014 - link
by "hardware TM" i assume he means Hardware transactional memory (controlling access to shared memory in concurrent computing., not the david may way:) not TM Hardware as in the ugly variety of door seals, closers on their site :)"ARM is interested in many new technologies " good i just noticed "UK: Plastic Logic" did some new Flexible OLED Screens and finally a partner program ,perhaps ARM can nip down the road and give them a leg up by doing a controller Ip for them and even help them with their questionable advertising :)
http://the-digital-reader.com/2014/06/03/plasticlo...
perhaps even get a selection/few pro bono Plastic Logic kit and make a slice to use it and give away as a prize or some such, the guy need help getting the word out it seems for a long time now, im a fan, but its taking far to long for them to provide super cheap consumer plastic displays for even DIY projects...
sinPiEqualsZero - Tuesday, June 3, 2014 - link
You hear a lot about how other companies are trying to replicate ARM's success in the power-efficient arena. Is the reverse true? That is, does ARM have any interest in broadening their product offerings to include processors of comparable power to desktop PCs?KFlautner - Thursday, June 5, 2014 - link
I would argue that everyone in the processor industry is now on the efficiency bandwagon. It's just that ARM has evolved under that efficiency pressure - due to its focus on the mobile industry - for longer than others. What differs between products is the market and the power budget available. Having a larger power budget doesn't mean that one can afford inefficient designs: the goal is still to get as much performance out of a given power budget as possible. If you take a look at our high-end cores, they are already desktop-class. However, the core isn't everything, our silicon partners need to see the business case for building desktop-oriented processors (with appropriate amount of in-chip caches, memory system, etc.) ... you can see this starting to happen in the server market now.lada - Tuesday, June 3, 2014 - link
What security measures does ARM have against industrial espionage and the now infamous NSA meddling with firmwares, software and anything computing related? Have ARM HQ been infiltrated? Are ARM designs safe, or contain some hardware "backdoors" for i.e. elevating rights (userspace to kernel, TrustZone, etc.) ? Are sources of ARM processors open to security audits? Does ARM do security audit on its own? Are sources versioned in a way that would detect "hacks" - changes(backdoors) to the sources ?And vice versa, does ARM have access to partners' IP that means third party ARM processors sources, or modified commodity ARM designs? To review them for security purposes?
These are all questions I've always wished to ask.
mercurylife - Wednesday, June 4, 2014 - link
+1Good Question
Krysto - Friday, June 6, 2014 - link
I've asked that before, and ARM doesn't seem interested in answering it. Shame.Netmsm - Tuesday, June 3, 2014 - link
HelloWould You state Your opinion about evolution of x86 ARM embedded devices? What is Your analysis about that? Is this showing that in future at long term, ARM won't be able to compete with x86 in terms of response to market needs? Or not, and embedding ARM in x86 based processors is a competiton advantage solely?
Thanks
Jonathan_Rung - Tuesday, June 3, 2014 - link
Hello! Compared to the great tech-savvy questions posed here, this may be a little too... mainstream to warrant your attention, but I'd love to hear your thoughts. Thanks!About four years ago, Intel's attempts to cut into the consumer SoC market amounted to a big flop, but sometime around the release of the Razr i, they seemed to be getting their act together. More recently, the Intel Celeron 2955U is arguably the best choice for Chromebooks when considering the dual and quad Exynos alternatives. As we've seen with Haswell, it's clear that Intel is really focused on improving their mobile processors as consumers gravitate towards mobile products and ARM eats up marketshare. Assuming Intel and AMD continue to prioritize low voltage/thermally constrained consumer devices, will ARM eventually yield to Intel/AMD in the mobile OS space and focus on embedded (boring) systems? I think ARM SoCs are still clearly the best choice for smartphones, but in about four years time, Intel's ability to compete in the mobile OS space has gone from "complete joke" to "somewhat viable." The Cortex-A50 series sounds great and I can't wait, and I know that nobody can predict the future, but what's your best guess for five, ten years down the line?
KFlautner - Thursday, June 5, 2014 - link
We don't have any intentions on giving up on the mobile market. ;) What you see playing out with competitive offerings is not really about technology but about business models: ARM is a much lower-cost ecosystem-based play, not an old-school high-margin vertical. I've placed my bets on which one I think will win long term. ;)Jonathan_Rung - Friday, June 6, 2014 - link
Thank you for responding!Factory Factory - Tuesday, June 3, 2014 - link
Lately we've seen CPUs and GPUs looked at for different general compute tasks - CPUs for latency-sensitive, complex, and/or single-threaded calculations and GPUs for throughput-based, shallow, highly-threaded calculations. Is there a third type of processor with a different balance of deep vs. wide, or perhaps one going in a different direction altogether, that you see becoming more relevant in the future?KFlautner - Thursday, June 5, 2014 - link
There are some interesting new workloads that are becoming relevant and some of these may warrant specific architectural support and/or new microarchitectures. An area I find interesting from an architecture point-of-view is machine learning. But we'll need some time for the algorithms to settle before we'd design in too much deep support for new workloads.Jaybus - Monday, June 9, 2014 - link
Are you talking about adding neurosynaptic cores alongside ARMv8 cores? Sort of a right-brain to go with the current left-brain? I know IBM Research has talked of going in that direction as well.lada - Wednesday, June 4, 2014 - link
Whatever happened to the ARM BIOS and/or other I/O standardization efforts? ("One Linux for all ARM systems") Will we see BIOS of some sorts, standard peripherals to rely boot on, or standard SATA controllers ? I think it's mandatory for ARM to grow into PC/server space to have standardized components besides instruction set (and with A5x 64bit ARM cores and up I see the best outcome in the cleanup of an ISA, but what about peripherals? Timers, interrupts, something for one kernel to rule them all?) To make Linus happy? ;)Some auto-discovery of peripherals akin to PCI,USB ? It wouldn't eat so many high-frequency transistors and could speed the development of kernel many times, IMO.
Writing from RPi, best regards and thanks for the answers and your opinion.
KFlautner - Thursday, June 5, 2014 - link
Check out linaro.org . This is an organization we've set up to cater to the needs of the ARM-based Linux ecosystem. I think they've taken many of the right steps to make Linus happier.... As far as standardization, we have also been working on various platform design documents that help our partners deploy functionality in a common way to reduce unnecessary fragmentation.A great example of this is the Server Base System Architecture (SBSA) specification we collaborated on with our silicon and software partners.
Johan from AnandTech had a good write up on it earlier this year http://www.anandtech.com/show/7721/arm-and-partner...
mercurylife - Wednesday, June 4, 2014 - link
Part 1: How about a teaser about the next 64bit 'Big' Core ?Part 2: What can we expect from the next iteration of ARM Trustzone ?
Thanks!
KFlautner - Thursday, June 5, 2014 - link
a) It's going to be 64 bit! ;)b) It's capitalized differently: TrustZone
Sorry - I cannot really say much about future ARM products and roadmaps.
aryonoco - Wednesday, June 4, 2014 - link
1) Which one gives you more sleepless nights, Intel or Imagination Technologies?2) Is ARM likely to make a big core SoC like Apple's Cyclone?
KFlautner - Thursday, June 5, 2014 - link
Not sure either of these cause me sleepless nights but they do give my subconscious some story lines to work on when asleep. ;) I don't think competition is a bad thing ...Check out: http://www.anandtech.com/show/6420/arms-cortex-a57... ... The A57 is our current "big" core.
Dmcq - Thursday, June 5, 2014 - link
Do you think there might be a comeback for ThumbEE type facilities in the 64 bit architecture to support JIT code or run time checking of things like overflows?It seems like everything is being virtualized, does this stop some type of developments that are incompatible with good virtualized performance?
What is the main thing you know now that you really wish you knew or fully understood five years ago?
KFlautner - Thursday, June 5, 2014 - link
Thanks for the questions.a) We do look at JIT performance as one of the metrics we optimize for. But we often find that it's better to cater to JITs at the microarchitecture level rather than at the architecture (instruction set). We've found that there are way too many different approaches to writing good JITs, and there are conflicts on when and how much the well-intentioned JIT-oriented instructions can be exploited.
b) I am not aware of any architectural development that we didn't do because of virtualization issues. However, I agree with you, there are ways of catering to virtualizability and others that make it harder. Architects do think about these issues.
c) It's a long list... But not much of it is related to computers. ;) Not because I claim to have known everything about the subject - but the objectives have been clear: how do you increase the efficiency of your designs (Performance / W) while keeping the costs down. It's a tall order and the means of achieving it change over time. But this metric has been front-and-center at ARM over much of its existence and I don't expect this to change radically.
twotwotwo - Thursday, June 5, 2014 - link
Hrm, chances are I'm too late, but an oddball question:OS X, Android, and Chrome OS all use some kind of compressed swap, running a cheap algorithm like LZO or Snappy on not-recently-used pages of RAM--it's a neat trick to pretend you've got more memory than you really do, and it works well because CPUs are so stupid-fast now.
Early slides for AMD's Seattle SoC advertise a dedicated compression coprocessor. I don't know how much dedicated hardware even helps with compression (from what little I know of the algorithms, it seems like you'd mainly need fast random access to a smallish memory cache, which the CPU should already be really good at). But if you could make some sort of compression "free", that could boost devices' effective RAM for the buck (or improve effective I/O speed for the buck, as the SandForce SSD controller's compression does).
So, I suppose you can't really talk plans, but is dedicated compression IP an interesting area?
OreoCookie - Thursday, June 5, 2014 - link
Intel covers its markets with variants of essentially two cores (Atom-class cores and Core-class cores) while ARM already has many more (at least 3 current A-series cores, discounting Cortex-M and Cortex-R cores). Is ARM comfortable with this strategy or is there a need for more specialized cores that address only a certain market? (E. g. does it make sense to bring a dedicated server core to market which is no longer suitable for smartphone and tablet applications?)KFlautner - Thursday, June 5, 2014 - link
Which cores get used for which market is ultimately up to our silicon partners.What ARM does is make sure is that we offer all the features that they need to take ARM cores into the appropriate segments.
OreoCookie - Friday, June 6, 2014 - link
Sure, but it seems to me that there is a burgeoning ARM server market, demand seems to be there. Since the requirements for server cores are different from those you have in mobile applications, I was wondering how ARM will address that market. (Apart from offering custom cores for that market, bringing suitable interconnects to market could be another approach.)KFlautner - Friday, June 6, 2014 - link
ARM does offer server class interconnects. Take a look at this story on our CCN interconnect technologies http://www.enterprisetech.com/2014/05/08/arm-serve...chrone - Thursday, June 5, 2014 - link
Hi Anand and Krisztian,When will we see a high performance single thread CPU and powerful GPU in ARM?
4/8 cores do not make sense in mobile platform since they only throttles a lot and makes the transition animation stutters or janky on Android due to the low performance when throttling.
Krysto - Friday, June 6, 2014 - link
What do you mean? Cortex A57 not powerful enough for mobile? Or do you mean in PC's?ARM doesn't need to target PCs with a higher performance/higher power consumption CPU and GPU. They will get there in a few years. "There" being around the same level of performance with Intel's mainstream Core chips. Why? Because Intel will forego chasing performance because:
1) it's getting increasingly more expensive for them to do so
2) chasing lower power consumption is much easier at this point, and also a pretty marketable feature.
Plus ARM CPUs are increasing in performance with each generation much more than Intel's CPUs do. So in a few years we'll see parity. ARM should be more concerned with Performance/Watt, since that's really where they will have to worry about Intel, since Intel is a node and a half ahead of ARM chips right now (although this gap will shrink with the arrival of FinFET, but still).
samirotiv - Thursday, June 5, 2014 - link
Hello,Could you please tell me a little about the kind of branch predictors that are used in ARM CPUs?
Is there scope to use the more modern predictors, such as TAGE, or are more primitive predictors, such as Gshare, used because of their simplicity?
Krysto - Friday, June 6, 2014 - link
Where's Cortex A55, and why didn't we see it launch at the same time with A53 and A57?findx - Saturday, June 7, 2014 - link
It was mentioned earlier that:"An area I find interesting from an architecture point-of-view is machine learning. But we'll need some time for the algorithms to settle before we'd design in too much deep support for new workloads."
Beyond SIMD for matrix routines, what types of new architecture level support do you envision helping machine learning workloads?
Wardrive86 - Sunday, June 8, 2014 - link
Does the increase in single precision NEON performance in the Cortex a57 (vs a15) come from just a clock speed increase? Can you talk about any enhancements that has been made to the FPU and NEON?martinez.lopez.alvaro - Monday, June 9, 2014 - link
Is there any interest by ARM of targeting the HPC market? Or more specifically, does ARM want to be in the top500 list with ARM cores alone (i.e without third parties accelerator, something like Intel Xeon Phi)?quadrivial - Wednesday, June 11, 2014 - link
Apple Cyclone (A7) hard-launched in September of last year. ARM's A57 won't hard launch until the middle of next year (close to 18 months later).How does the first-party ARM design team fall so far behind when foreknowledge of the new ISA gives them a head start in micro-architecture design and implementation? What is ARM R&D doing to catch up with their licensees?
sverre_j - Wednesday, June 11, 2014 - link
Back to the question about the new 64-bit ISA.I was surprised to leard that the ARMv8 instructions translate into microops. Was this done to cater for instructions that need to be broken up into pieces? Were there other reasons for doing so?
Thanks.