Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance

Name: Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance
Item: Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance
Author: Gavin Bonshor

by Gavin Bonshor on June 3, 2024 11:00 PM EST

90 Comments | Add A Comment

90 Comments + Add A
Comment

Intel this morning is lifting the lid on some of the finer architectural and technical details about its upcoming Lunar Lake SoC – the chip that will be the next generation of Core Ultra mobile processors. Once again holding one of their increasingly regular Tech Tour events for media and analysts, Intel this time set up shop in Taipei just before the beginning of Computex 2024. During the Tech Tour, Intel disclosed numerous facets of Lunar Lake, including their new P-Core design codenamed Lion Cove and a new wave of E-cores that are a bit more like Meteor Lake's pioneering Low Power Island E-Cores. Also disclosed was the Intel NPU 4, which Intel claims delivers up to 48 TOPS, surpassing Microsoft's Copilot+ requirements for the new age of AI PCs.

Intel's Lunar Lake represents a strategic evolution in their mobile SoC lineup, building on their Meteor Lake launch last year, focusing on enhancing power efficiency and optimizing performance across the board. Lunar Lake dynamically allocates tasks to efficient cores (E-cores) or performance cores (P-cores) based on workload demands by leveraging advanced scheduling mechanisms, which are assigned to ensure optimal power usage and performance. Still, once again, Intel Thread Director, along with Windows 11, plays a pivotal role in this process, guiding the OS scheduler to make real-time adjustments that balance efficiency with computational power depending on the intensity of the workload.

Intel CPU Architecture Generations
	Alder/Raptor Lake	Meteor Lake	Lunar Lake	Arrow Lake	Panther Lake
P-Core Architecture	Golden Cove/ Raptor Cove	Redwood Cove	Lion Cove	Lion Cove	Cougar Cove?
E-Core Architecture	Gracemont	Crestmont	Skymont	Crestmont?	Darkmont?
GPU Architecture	Xe-LP	Xe-LPG	Xe2	Xe2?	?
NPU Architecture	N/A	NPU 3720	NPU 4	?	?
Active Tiles	1 (Monolithic)	4	2	4?	?
Manufacturing Processes	Intel 7	Intel 4 + TSMC N6 + TSMC N5	TSMC N3B + TSMC N6	Intel 20A + More	Intel 18A
Segment	Mobile + Desktop	Mobile	LP Mobile	HP Mobile + Desktop	Mobile?
Release Date (OEM)	Q4'2021	Q4'2023	Q3'2024	Q4'2024	2025

Lunar Lake: Designed By Intel, Built By TSMC (& Assembled By Intel)

While there are many aspects of Lunar Lake to dive into, perhaps it's best we start with what's sure to be the most eye-catching: who's building it.

Intel's Lunar Lake tiles are not being fabbed using any of their own foundry facilities – a sharp departure from historical precedence, and even the recent Meteor Lake, where the compute tile was made using the Intel 4 process. Instead, both tiles of the disaggregated Lunar Lake are being fabbed over at TSMC, using a mix of TSMC's N3B and N6 processes. In 2021 Intel set about freeing their chip design groups to use the best foundry they could – be it internal or external – and there's no place that's more apparent than here.

Overall, Lunar Lake represents their second generation of disaggregated SoC architecture for the mobile market, replacing the Meteor Lake architecture in the lower-end space. At this time, Intel has disclosed that it uses a 4P+4E (8 core) design, with hyper-threading/SMT disabled, so the total thread count supported by the processor is simply the number of CPU cores, e.g., 4P+4E/8T.

The build-up of Lunar Lake combines a synergetic collaboration between Intel’s architectural design team and TSMC's manufacturing process nodes to bring the latest Lion Cove P-cores to Lunar Lake, which boosts Intel's architectural IPC as you would expect from a new generation. At the same time, Intel also introduces the Skymont E-cores, which replace the Low Power Island Cresmont E-cores of Meteor Lake. Notably, however, these E-cores don't connect to the ring bus like the P-cores, which makes them a sort of hybrid LP E-core, combining the efficiency gains of the more advanced TSMC N3B node with the double-digit gains in IPC over the previous Crestmont cores.

The entire compute tile, including the P and E-cores, is built on TSMC's N3B node, while the SoC tile is made using the TSMC N6 node.

At a higher level, Intel is once again using their Foveros packaging technology here. Both the compute and SoC (now the "Platform Controller") tiles sit on top of a base tile, which provides high-speed/low-power routing between the tiles, and further connectivity to the rest of the chip and beyond.

In another first for a mainstream Intel Core product, the Lunar Lake SoC platform also includes up to 32 GB of LPDDR5X memory on the chip package itself. This is arranged as a pair of 64-bit memory chips, offering a total 128-bit memory interface. As with other vendors using on-package memory, this change means that users can't just upgrade DRAM at-will, and the memory configurations for Lunar Lake will ultimately be determined by what SKUs Intel opts to ship.

With Lunar Lake, Intel is also strongly focusing on AI, as the architecture integrates a new NPU called NPU 4. This NPU is rated for up to 48 TOPS of INT8 performance, thus making it Microsoft Copilot+ AI PC ready. This is the bar all of the PC SoC vendors are aiming for, including AMD and Qualcomm too.

Intel's integrated GPU will also be a contributing player here. While not the highly efficient machine that the dedicated NPU is, the Arc Xe2-LPG brings dozens of additional T(FL)OPS of performance with it, and some additional flexibility an NPU doesn't come with. Which is why you'll also see Intel rating the performance of these chips in terms of total platform TOPS – in this case, 120 TOPS.

Intel's collaboration with Microsoft further enhances workload management through the fabled Intel Thread Director, optimized for applications such as the Copilot assistant. Given the time of the introduction of Lunar Lake, it somewhat sets the stage for a Q3 2024 launch, which coincides with the holiday 2024 market.

Intel Lunar Lake: Updating Intel Thread Director & Power Management Improvements

To say that energy efficiency is a key goal for Lunar Lake would be an understatement. For as much as Intel is riding high in the mobile PC CPU market (AMD's share there is still but a fraction), the company has been feeling the pressure over the last few years from customer-turned-rival Apple, whose M-series Apple Silicon has been setting the bar for power efficiency over the last few years. And now with Qualcomm attempting to do the same things for the Windows ecosystem with their forthcoming Snapdragon X chips, Intel is preparing to make their own power play.

Intel's Thread Director and power management updates for Lunar Lake show various and significant improvements compared to Meteor Lake. The Thread Director uses a heterogeneous scheduling policy, initially assigning tasks to a single E-core and expanding to other E-cores or P-cores as and when needed. OS containment zones are designed to limit tasks to specific cores, which directly improves power efficiency and delivers the performance needed by the right core for the workload at hand. Integration with power management systems and a quad array of Power Management Controllers (PMC) further allows the chip, in concert with Windows 11, to make context-aware adjustments, ensuring optimal performance with minimal power usage and wastage.

Lunar Lake's scheduling strategy effectively handles power-sensitive applications. One example Intel gave is that video conferencing tasks are kept within the efficiency core cluster, utilizing the E-cores to maintain performance while reducing power consumption by up to 35%, as shown by Intel's provided data. These improvements are achieved through collaboration with OS developers such as Microsoft for seamless integration for optimizing for the best balance between power consumption and performance.

Focusing on the power management system for Lunar Lake, Intel uses its SoC power management, operating in efficiency, balance, and performance modes tailored and designed to adapt to whatever the demands of the workload at the time of operation. This multi-layered approach allows the Lunar Lake SoC to operate efficiently. Again, much like the Intel Thread Director, the PMCs can balance power usage with performance needs.

Intel further plans to enhance the Thread Director by increasing scenario granularity, implementing AI-based scheduling hints, and enabling cross-IP scheduling within Windows 11. These enhancements essentially equate to workload management designed to boost overall power efficiency and deliver performance across various applications when needed without wasting power budget by allocating lighter tasks to the higher power P-cores.

Over the next few pages, we'll explore the new P and E cores and Intel's update to ther integrated Arc Xe (Xe2-LPG) graphics.

Intel Lunar Lake: New P-Core, Enter Lion Cove

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

90 Comments

View All Comments

Silver5urfer - Tuesday, June 4, 2024 - link
Disaster for Intel. Finally they folded. Intel fabs are now not even used for their high volume BGA junk processors. Instead using TSMC.

Second thing is as everyone pointed out they are comparing LP-E to E cores lol to inflate the graphs. Also the IPC is meager at best, Raptor Cove is faster than Meteor one and they are using that figure.

ARL will lack HT on top of this reduced clockrate, interesting times ahead for Desktop battle. Reply
Drumsticks - Tuesday, June 4, 2024 - link
They aren’t comparing LP E-Cores to E-Cores. LNL E-cores are separated from the LLC, same as MTL island cores. It’s an apt comparison.

On the flip side, the comparison to Raptor cove is with E-cores connected to the LLC and ring bus, just as Raptor cove would be. It’s also an apt comparison. You’ll see island E-cores only on LNL (because of the power advantages) and ring bus connected E-cores on Arrow Lake (because of the performance advantages). Reply
Kangal - Wednesday, June 5, 2024 - link
I don't know, but I am pretty underwhelmed.
Intel is the least trusted tech giant, even Nvidia look better when it comes to honesty.

Here it seems like Intel took two steps forward, and three steps back. They are probably at a loss in either pricing, efficiency, or performance. Or more likely all three. That's why they use smoke and mirrors and try to trick the viewers/shareholders with the technicalities.

It's not like AMD didn't do the same, but they stand behind their technology, and actually showcased real products. And they also gave benchmarks. That's how you know they are confident.

It seems the CPU and GPU space is going to be a bloodbath for Intel. And we need all the competition we can get. But it is a little amusing to seeing Intel squirm. Ironically Intel is going the way of Bulldozer (shared cores) whilst AMD is sticking with Hyperthreading (extra bits per core) design. It's only amusing because Intel did unethical and illegal business practices that led to AMDs bankruptcy more than a decade ago. Microsoft is also complicit in that.
Reply
Terry_Craig - Wednesday, June 5, 2024 - link
Sounds like an intel employee. People care about performance, not excuses, the problem with the comparison is that the LP-E cores are much inferior to the already deficient E-Cores.

https://chipsandcheese.com/2024/05/20/comparing-cr... Reply
Drumsticks - Tuesday, June 11, 2024 - link
Not sure if this was a reply to me because of page breaks, but if it was, what about what I said is untrue or biased?

From the (excellent, by the way) Chips article: "I wonder if Intel could give low power Crestmont a larger L2 cache, or even drop some blocks on Meteor Lake’s SoC tile to make room for a system level cache." - this is exactly what was done in Lunar Lake. The LNL E-Cores don't access the same L3 as the P-Cores, but there's an 8MB System level cache that they can access (that the rest of the chip also can I think, P-Cores, GPU, and NPU included). That probably is a big part of the giant 40-70% performance gain they show.

And E-Cores connected to the ring bus ARE much better, by Intel's own admission and by, again, the Chips article. Skymont E-Cores coming to ARL are (presumably) on the ring bus, and should punch much better than LNL E-Cores because of it.

None of this means that Intel's design is the best, or that it's not going to fall flat. That devil is still in the details, which Intel still needs to give to us. But I'm not sure how we can argue that the explicit details of the implementation are somehow biased or an excuse. That IS how Intel designed the chip; whether or not it is a good design remains to be seen. IMO, it seems like a pretty decent concept, but we'll have to see how much power the new P-Cores are really saving. With a 4P+4e design, they will need to be pretty efficient to match what Zen 5 will be up to, even in low power setups. (I assume 15W and above will get an arrow lake design that has more p cores and/or E cores on the ring bus). Reply
Drumsticks - Tuesday, June 11, 2024 - link
One other thought - based on the Chips and Cheese article, LP E-Cores seem to be anywhere from 10-30% slower without access to an L3 cache. That Intel is calling out a 40-70% gain in Skymont LPE core performance over Crestmont LP-E is pretty noteworthy if nothing else. Even at their 10% (which is nuts) margin of error, the LPE core Skymont cores (albeit at least with access to a system cache) are as fast as Crestmont cores with a full blown 24MB L3 cache.

Again, benchmarks are king, but assuming Skymont LP-E is bad because Crestmont LP-E was bad seems like a poor assumption given the underlying conditions are completely different. Reply
GeoffreyA - Tuesday, June 4, 2024 - link
On the P side, most interesting is Lion Cove's moving to a split-scheduler design, saying good-bye to their classic unified approach there since the P6. AMD, always thinking ahead, has been using the split scheduler since the Athlon. Reply
Blastdoor - Tuesday, June 4, 2024 - link
This really looks like a SOC made for a MacBook Air. Reply
lmcd - Wednesday, June 12, 2024 - link
Or intended to beat out Snapdragon Elite if its date didn't slip. Reply
NextGen_Gamer - Tuesday, June 4, 2024 - link
With confirmation that the entire compute tile is made on TSMC's N3B process, I guess we can take that to mean Intel was not super confident in mass yields on its own 20A process. Intel's 20A will be used in Arrow Lake, the desktop equivalent to Lunar Lake. Desktop shipments are a small fraction of laptop chips nowadays, so that makes sense. This does create a really interesting opportunity that I hope Anandtech will explore, where you could take a desktop Arrow Lake processor, disable enough P-cores and E-cores to make it equal to Lunar Lake, and see how they compare. Same architectures, but one on TSMC N3B versus Intel 20A. Reply

Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance

Lunar Lake: Designed By Intel, Built By TSMC (& Assembled By Intel)

Intel Lunar Lake: Updating Intel Thread Director & Power Management Improvements

Post Your Comment

90 Comments

View All Comments

Silver5urfer - Tuesday, June 4, 2024 - link

Drumsticks - Tuesday, June 4, 2024 - link

Kangal - Wednesday, June 5, 2024 - link

Terry_Craig - Wednesday, June 5, 2024 - link

Drumsticks - Tuesday, June 11, 2024 - link

Drumsticks - Tuesday, June 11, 2024 - link

GeoffreyA - Tuesday, June 4, 2024 - link

Blastdoor - Tuesday, June 4, 2024 - link

lmcd - Wednesday, June 12, 2024 - link

NextGen_Gamer - Tuesday, June 4, 2024 - link

Log in

Don't have an account? Sign up now