Evaluation Setup and Testing Methodology

The Supermicro SuperServer E302-9D is not a run-of-the-mill server, and its evaluation has to focus on aspects beyond the regular generic testing of the CPU capabilities. The system's focus is on applications requiring a large number of high-speed network interfaces, and our evaluation setup with the server as the device-under-test (DUT) also reflects this.

Testbed and DUT Configuration

The E302-9D sports eight network interfaces with four gigabit copper ports, and four 10 gigabit ones. Our testing focuses on the 10 gigabit interfaces. These are connected to the stimulus source and sink in our test network topology. Out of the four gigabit ports, one is connected to the management network, while the other three are left idle. The management network is used to send test commands to the source and the sink, while remotely controlling the DUT configuration.

The stimulus source is the Supermicro SuperServer 5019D-4C-FN8TP, which is the actively cooled 1U rackmount version of the DUT. It uses the same Intel Xeon D-2123IT SoC and the same motherboard. Only the cooling solution and chassis are different. The sink is the Supermicro Superserver SYS-5028D-TN4T, which uses the Xeon D-1540 Broadwell-DE SoC. The conductor (a Compulab fitlet-XA10-LAN unit) is the PC that acts as the master for the framework testing these distributed systems, and acts to synchronize various operations of the members and collect results over the management network. The systems in the above configuration all run FreeBSD 12.1-RELEASE, except for the DUT running pfSense 2.4.5 (based on FreeBSD 11.3). In our initial setup, the sink's native 10GBASE-T ports were connected to the DUT. These ports worked fine with Windows Server 2019 Standard running on the sink. However, with FreeBSD 12.1, only one of the 10GBASE-T ports got initialized successfully, with the other suffering a hardware initialization failure. To circumvent this issue, we installed a spare Intel X540-T2 half-height PCIe 2.0 x8 card in the system's PCIe slot. Strangely, FreeBSD again showed a initialization failure for one of the two new ports. Fortunately, we did end up with two working 10G BASE-T ports in the sink, and I did not have to spend any additional time debugging FreeBSD's refusal to activate those specific interfaces in the Xeon D-1540-based system.

On the DUT side, the interfaces are configured in the pfSense installation as shown in the screenshot below. DHCP servers are activated on all the four 10 gigabit interfaces of the DUT. This configuration is persistent across reboots, and helps in minimizing the setup tasks for each of the performance evaluation runs described further down.

For certain benchmarking scenarios, minor modifications of the interface characteristics are needed. These tweaks are done via shell scripts.

Packet Forwarding Benchmarks

Throughput benchmarks tell only a part of the story. Evaluation of a firewall involves determination of how enabling various options affects the packet processing capabilities. Monitoring the DUT's resource usage and attempting to maximize it with artificial scenarios doesn't deliver much actionable information to end-users. At AsiaBSDCon 2015, a network performance evaluation paper was presented that brought out the challenges involved in creating consistently reproducible benchmarks for firewalls such as pfSense.

The scripts and configuration files for different scenarios in the scheme described above are available under a BSD-2-Clause license in the freebsd-net/netperf github repo. The benchmarks presented in this review are based on this methodology. However, we only take a subset of relevant scenarios for a multitude of reasons - Some of the tests are only relevant to the firewall kernel developers, while some others (such as the comparison between fast-forwarding tunred off and on) are no longer relevant in recent releases of pfSense.

The described methodology makes use of two open-source performance evaluation tools:

  • iPerf3
  • pkt-gen

While iPerf3 enables quick throughput testing, pkt-gen helps in evaluating how the firewall performs under worst-case conditions (read, processing of packets much smaller than the MTU).

Evaluation is done in the following scenarios:

  • Router Mode - The firewall is completely disabled and packet forwarding between all LANs (OPT interfaces in our DUT configuration) is enabled. In this configuration, we essentially benchmark a router
  • PF (No Filters) - The packet filter is enabled, but the rule set involves allowing all traffic
  • PF (Default Ruleset) - The packet filter is enabled with the default rule-set and a few modifications to allow for the benchmark streams
  • PF (NAT Mode) - The packet filter is configured with NAT enabled across two of the interfaces to simulate a multi-WAN scenario
  • IPSec - The packet filter is enabled with the default rule-set and a few modifications to allow for the benchmark streams, and a couple of different encryption / hashing algorithm sets are evaluated.

In benchmarking configurations, it is customary to ensure that the stimulus-generating hardware is powerful enough to not be the testing bottleneck. One of the fortunate aspects we are dealing with is that networking performance (particularly at 10G+ speeds) hardly benefits from high core-count or multi-socket systems - the performance penalties associated with moving the packet processing application associated with a particular interface to another core or socket becomes unacceptable. Hardware acceleration on the NICs matter more than CPU performance, though higher per-core/single-threaded performance is definitely welcome. In this context, a look at the suitability of the two testbed machines for packet generation and driving is warranted first.

Setup and Usage Impressions Packet Generation Options - A Quantitative Comparison
POST A COMMENT

34 Comments

View All Comments

  • eastcoast_pete - Tuesday, July 28, 2020 - link

    Thanks, interesting review! Might be (partially) my ignorance of the design process, but wouldn't it be better from a thermal perspective to use the case, especially the top part of the housing directly as heat sink? The current setup transfers the heat to the inside space of the unit and then relies on passive con
    vection or radiation to dispose of the heat. Not surprised that it gets really toasty in there.
    Reply
  • DanNeely - Tuesday, July 28, 2020 - link

    From a thermal standpoint yes - if everything is assembled perfectly. With that design though, you'd need to screw attach the heat sink to the CPU via screws from below, and remove/reattach it from the CPU every time you open the case up. This setup allows the heatsink to be semi-permanently attached to the CPU like in a conventional install.

    You're also mistaken about it relying on passive heat transfer, the top of the case has some large thermal pads that will make contact with the tops of the heat sinks. (They're the white stuff on the inside of the lid in the first gallery photo; made slightly confusing by the lid being rotated 180 from the mobo.) Because of the larger contact area and lower peak heat concentration levels thermal pads are much less finicy about being pulled apart and slapped together than the TIM between a chip and the heatsink base.
    Reply
  • Lindegren - Tuesday, July 28, 2020 - link

    Could be Solved by having the CPU on the opposite side og the board Reply
  • close - Wednesday, July 29, 2020 - link

    Lower power designs do that quite often. The MoBo is flipped so it faces down, the CPU is on the back side of the MoBo (top side of the system) covered by a thick, finned panel to serve as passive radiator. They probably wanted to save on designing a MoBo with the CPU on the other side. Reply
  • eastcoast_pete - Tuesday, July 28, 2020 - link

    Appreciate the comment on the rotated case; those thermal pads looked oddly out of place. But, as Lindegren's comment pointed out, having the CPU on the opposite site of this, after all, custom MB, one could have the main heat source (SoC/CPU) facing "up", and all others facing "down".
    For maybe irrational reasons, I just don't like VRMs, SSDs and similar getting so toasty in an always-on piece of networking equipment.
    Reply
  • YB1064 - Wednesday, July 29, 2020 - link

    Crazy expensive price! Reply
  • Valantar - Wednesday, July 29, 2020 - link

    I think you got tricked by the use of a shot of the motherboard with a standard server heatsink. Look at the teardown shots; this version of the motherboard is paired with a passive heat transfer block with heat pipes which connects directly to the top chassis. No convection involved inside of the chassis. Should be reasonably efficient, though of course the top of the chassis doesn't have that many or that large fins. A layer of heat pipes running across it on the inside would probably have helped. Reply
  • herozeros - Tuesday, July 28, 2020 - link

    Neat review! I was hoping you could offer an opinion on why they elected to not include a SKU without quickassist? So many great router scenarios with some juicy 10G ports, but bottlenecks if you’re trafficing in resource intensive IPSec connections, no? Thanks! Reply
  • herozeros - Tuesday, July 28, 2020 - link

    Me English are bad, should read “a SKU without Quickassist” Reply
  • GreenReaper - Tuesday, July 28, 2020 - link

    The MSRP of the D-2123IT is $213. All D-2100 CPUs with QAT are >$500:
    https://www.servethehome.com/intel-xeon-d-2100-ser...
    https://ark.intel.com/content/www/us/en/ark/produc...
    And the cheapest of those has a lower all-core turbo, which might bite for consistency.

    It's also the only one with just four cores. Thanks to this it's the only one that hits a 60W TDP.
    Bear in mind internals are already pushing 90C, in what is presumably a reasonably cool location.

    The closest (at 235% the cost) is the 8-core D-2145NT (65W, 1.9Ghz base, 2.5Ghz all-core turbo).
    Sure, it *could* do more processing, but for most use-cases it won't be better and may be worse. To be sure it wasn't slower, you'd want to step up to D-2146NT; but now it's 80W (and 301% the cost). And the memory is *still* slower in that case (2133 vs 2400). Basically you're looking at rack-mount, or at the very least some kind of active cooling solution - or something that's not running on Intel.

    Power is a big deal here. I use a quad-core D-1521 as a CPU for a relatively large DB-driven site, and it hits ~40W of its 45W TDP. For that you get 2.7Ghz all-core, although it's theoretically 2.4-2.7Ghz. The D-1541 with twice the cores only gets ~60% of the performance, because it's _actually_ limited by power. So I don't doubt TDP scaling indicates a real difference in usage.

    A lower CPU price also gives SuperMicro significant latitude for profit - or for a big bulk discount.
    Reply

Log in

Don't have an account? Sign up now