Evaluation Setup and Testing Methodology

The Supermicro SuperServer E302-9D is not a run-of-the-mill server, and its evaluation has to focus on aspects beyond the regular generic testing of the CPU capabilities. The system's focus is on applications requiring a large number of high-speed network interfaces, and our evaluation setup with the server as the device-under-test (DUT) also reflects this.

Testbed and DUT Configuration

The E302-9D sports eight network interfaces with four gigabit copper ports, and four 10 gigabit ones. Our testing focuses on the 10 gigabit interfaces. These are connected to the stimulus source and sink in our test network topology. Out of the four gigabit ports, one is connected to the management network, while the other three are left idle. The management network is used to send test commands to the source and the sink, while remotely controlling the DUT configuration.

The stimulus source is the Supermicro SuperServer 5019D-4C-FN8TP, which is the actively cooled 1U rackmount version of the DUT. It uses the same Intel Xeon D-2123IT SoC and the same motherboard. Only the cooling solution and chassis are different. The sink is the Supermicro Superserver SYS-5028D-TN4T, which uses the Xeon D-1540 Broadwell-DE SoC. The conductor (a Compulab fitlet-XA10-LAN unit) is the PC that acts as the master for the framework testing these distributed systems, and acts to synchronize various operations of the members and collect results over the management network. The systems in the above configuration all run FreeBSD 12.1-RELEASE, except for the DUT running pfSense 2.4.5 (based on FreeBSD 11.3). In our initial setup, the sink's native 10GBASE-T ports were connected to the DUT. These ports worked fine with Windows Server 2019 Standard running on the sink. However, with FreeBSD 12.1, only one of the 10GBASE-T ports got initialized successfully, with the other suffering a hardware initialization failure. To circumvent this issue, we installed a spare Intel X540-T2 half-height PCIe 2.0 x8 card in the system's PCIe slot. Strangely, FreeBSD again showed a initialization failure for one of the two new ports. Fortunately, we did end up with two working 10G BASE-T ports in the sink, and I did not have to spend any additional time debugging FreeBSD's refusal to activate those specific interfaces in the Xeon D-1540-based system.

On the DUT side, the interfaces are configured in the pfSense installation as shown in the screenshot below. DHCP servers are activated on all the four 10 gigabit interfaces of the DUT. This configuration is persistent across reboots, and helps in minimizing the setup tasks for each of the performance evaluation runs described further down.

For certain benchmarking scenarios, minor modifications of the interface characteristics are needed. These tweaks are done via shell scripts.

Packet Forwarding Benchmarks

Throughput benchmarks tell only a part of the story. Evaluation of a firewall involves determination of how enabling various options affects the packet processing capabilities. Monitoring the DUT's resource usage and attempting to maximize it with artificial scenarios doesn't deliver much actionable information to end-users. At AsiaBSDCon 2015, a network performance evaluation paper was presented that brought out the challenges involved in creating consistently reproducible benchmarks for firewalls such as pfSense.

The scripts and configuration files for different scenarios in the scheme described above are available under a BSD-2-Clause license in the freebsd-net/netperf github repo. The benchmarks presented in this review are based on this methodology. However, we only take a subset of relevant scenarios for a multitude of reasons - Some of the tests are only relevant to the firewall kernel developers, while some others (such as the comparison between fast-forwarding tunred off and on) are no longer relevant in recent releases of pfSense.

The described methodology makes use of two open-source performance evaluation tools:

  • iPerf3
  • pkt-gen

While iPerf3 enables quick throughput testing, pkt-gen helps in evaluating how the firewall performs under worst-case conditions (read, processing of packets much smaller than the MTU).

Evaluation is done in the following scenarios:

  • Router Mode - The firewall is completely disabled and packet forwarding between all LANs (OPT interfaces in our DUT configuration) is enabled. In this configuration, we essentially benchmark a router
  • PF (No Filters) - The packet filter is enabled, but the rule set involves allowing all traffic
  • PF (Default Ruleset) - The packet filter is enabled with the default rule-set and a few modifications to allow for the benchmark streams
  • PF (NAT Mode) - The packet filter is configured with NAT enabled across two of the interfaces to simulate a multi-WAN scenario
  • IPSec - The packet filter is enabled with the default rule-set and a few modifications to allow for the benchmark streams, and a couple of different encryption / hashing algorithm sets are evaluated.

In benchmarking configurations, it is customary to ensure that the stimulus-generating hardware is powerful enough to not be the testing bottleneck. One of the fortunate aspects we are dealing with is that networking performance (particularly at 10G+ speeds) hardly benefits from high core-count or multi-socket systems - the performance penalties associated with moving the packet processing application associated with a particular interface to another core or socket becomes unacceptable. Hardware acceleration on the NICs matter more than CPU performance, though higher per-core/single-threaded performance is definitely welcome. In this context, a look at the suitability of the two testbed machines for packet generation and driving is warranted first.

Setup and Usage Impressions Packet Generation Options - A Quantitative Comparison
Comments Locked

34 Comments

View All Comments

  • Jorgp2 - Thursday, July 30, 2020 - link

    Maybe you should learn the difference between a switch and a router first.
  • newyork10023 - Thursday, July 30, 2020 - link

    Why do you people have to troll everywhere you go?
  • Gonemad - Wednesday, July 29, 2020 - link

    Oh boy. I once got Wi-Fi "AC" 5GHz, 5Gbps, and 5G mobile networks mixed once by my mother. It took a while to explain those to her.

    Don't use 10G to mean 10 Gbps, please! HAHAHA.
  • timecop1818 - Wednesday, July 29, 2020 - link

    Fortunately, when Ethernet says 10Gbps, that's what it means.
  • imaheadcase - Wednesday, July 29, 2020 - link

    Put the name Supermicro on it and you know its not for consumers.
  • newyork10023 - Wednesday, July 29, 2020 - link

    The Supermicro manual states that a PCIe card installed is limited to networking (and will require a fan installed). An HBA card can't be installed?
  • abufrejoval - Wednesday, July 29, 2020 - link

    Since I use both pfSense as a firewall and a D-1541 Xeon machine (but not for the firewall) and I share the dream of systems that are practically silent, I feel compelled to add some thoughts:

    I started using pfSense on a passive J1900 Atom board which had dual Gbit on-board and cost less than €100. That worked pretty well until my broadband exceeded 200Mbit/s, mostly because it wasn’t just a firewall, but also added Suricata traffic inspection (tried Snort, too, very similar results).

    And that’s what’s wrong with this article: 10Gbit Xeon-Ds are great when all you do is push packet, but don’t look at them. They are even greater when you terminate SSL connections on them with the QuickAssist variants. They are great when they work together with their bigger CPU brothers, who will then crunch on the logic of the data.

    In the home-appliance context that you allude to, you won’t have ten types of machines to optimally distribute that work. QuickAssist won’t deliver benefits while the CPU will run out of steam far before even a Gbit connection is saturated when you use it just for the front end of the DMZ (firewall/SSL termination/VPN/deep inspection/load-balancing-failover).

    Put proxies, caches or even application servers on them as well, even a single 10Gbit interface may be a total waste.

    I had to resort to an i7-7700T which seems a bit quicker than the D-2123IT at only 35Watts TDP (and much cheaper) to sustain 500Mbit/s download bandwidth with the best gratis Suricata rule set. Judging by CPU load observations it will just about manage the Gbit loads its ports can handle, pretty sure that 2.5/5/10 Gbit will just throttle on inspection load, like the J1900 did at 200Mbit/s.

    I use a D-1541 as an additional compute node in an oVirt 3 node HCI gluster with 3x 2.5Gbit J5005 storage nodes. I can probably go to 6x 2.5Gbit before its 10Gbit NIC becomes a bottleneck.

    The D-1541’s benefit there is lots of RAM and cores, while it’s practically silent with 45 Watts TDP and none of the applications on it require vast amounts of CPU power.

    I am waiting for an 8-core AMD 4000 Pro 35 Watt TDP APU to come as Mini-ITX capable of handling 64 or 128GB of ECC-RAM to replace the Xeon D-1541 and bring the price for such a mini server below that of a laptop with the same ingredients.
  • newyork10023 - Wednesday, July 29, 2020 - link

    With an HBA (were it possible, hence my question), the 10Gbps serves a possible use (storage). Pushing and inspection exceeds x86 limits now. See TNSR for real x86 limits (wighout inspection).
  • abufrejoval - Wednesday, July 29, 2020 - link

    That would seem apply to the chassis, not to the mainboard or SoC.
    There is nothing to prevent it from working per se.

    I am pretty sure you can add a 16-port SAS HBA or even NVMeOF card and plenty of external storage, if thermals and power fit. A Mellanox 100Gbit card should be fine electrically, logically etc, even if there is nothing behind to sustain that throughput.

    I've had an Nvidia GTX1070 GPU in the SuperMicro Mini-ITX D-1541 for a while, no problem at all, functionally, even if games still seem to prefer Hertz over cores. Actually GPU accellerated machine learning inference was the original use case of that box.
  • newyork10023 - Wednesday, July 29, 2020 - link

    As pointed out, the D2123IT has no QAT, so a QAT accelerator would take up an available PCIe slot. It could push 10G packets then, but not save them or think (AI) on them.

Log in

Don't have an account? Sign up now