Link aggregation - a story and a tutorial full of struggles

Link aggregation is said to be one of the most widely misunderstood network-related protocols. The basic premise is that you can combine individual - not limited to - gigabit ethernet links to increase the total throughput. I've seen many tech "enthusiasts" jump on the LAG bandwagon, only to discover it doesn't work the way they imagined. Well, this time I ended up like them - except I actually made it work and learned a bunch of new stuff along the way.

History time!

I first began experimenting with LAG, soon after buying the used Powerconnect 5324 switch. I quickly ran into obstacles due to OpenWRT config difficulties and the fact that my former router, a TP-Link Archer C7v5 has toaster-grade hardware that is unsuitable for this task. Upgrading to 10G at that time was exorbitantly expensive (in fact, it still is) and therefore out of the question. I gave up on LAG for a while and moved on.

Fast forward into 2023: having learnt my lesson, I thought maybe setting up a link aggregation group between my main PC and the NAS would work. I was determined to do this, since the gigabit link between those two presented a bottleneck and the slow transfers were driving me crazy. So I went and purchased two used PCIe 2.0 x4 4-port server NICs, an Intel i350 and a HP (Broadcom) T331. Since then I neither time nor patience to work on this, so the NICs were left unused until last week. I have set myself a deadline that I have to make LAG work till end of the month, because that's when I want to perform the NAS ZFS pool migration onto larger hard drives, and with a 4-link LAG the time to do this would be significantly lower.

Expectations and reality

One of the most important concepts (and one of the hardest pills to swallow) regarding LAG is the value of the xmit_hash_policy parameter. Splitting up traffic across LAG links without any controlling mechanism would introduce TCP segment reordering, which will actually decrease performance. The xmit_hash_policy parameter basically controls how data is split between links. There are a few options for this parameter, but every device on the ethernet chain has to support that option for LAG to work as intended. The most common options for this settings are:

0, layer2: The hash formula to determine the link: (SRC_MAC ^ DEST_MAC) % NSLAVES
1, layer3+4: The hash formula to determine the link: (SRC_PORT ^ DEST_PORT) ^ ((SRC_IP ^ DEST_IP) & 0xFFFF) % NSLAVES
2, layer2+3: The hash formula to determine the link: ((SRC_IP ^ DEST_IP) & 0xFFFF) ^ (SRC_MAC ^ DEST_MAC) % NSLAVES

The caret (^) denotes the XOR operation; NSLAVES refers to the number of links participating in bonding.

As we can see modes 0 and 2 will only push traffic through multiple links if the target MAC and/or IP differs. My use case is to speed up transfers between the NAS and the PC, thus these two options won't do the job. Mode 1 is more interesting: port numbers are also taken into consideration when choosing links for outgoing traffic. However, this is also useless because I transfer data using SFTP on port 22 - unless I set up the OpenSSH server to listen on multiple ports! Open up /etc/ssh/sshd_config and add some extra ports:

(...)
Port 10000
Port 10001
Port 10002
Port 10003
#AddressFamily any
(...)

The actual tutorial

Before you continue I highly suggest watching this video if you have doubts about the terminology or how LAG and LACP is supposed to work. The following section is going to be a quick and informative tutorial explaining what I've done.

My setup consisted of the following:

A Linux PC running Manjaro, housing the T331 NIC
A NAS running FreeBSD 13.1-RELEASE-p5, housing the I350 NIC
A Dell Powerconnect 5324 switch, connected to both NICs with 4 UTP cables (8 in total)
A router connected to the switch, running a DHCP server

You will also need to access the Powerconnect serial console. If you have it already configured you can use SSH or Telnet, otherwise you'll need a serial null-modem cable and a RS-232 -> USB adapter.

Configuring FreeBSD

Compared to Linux, FreeBSD userspace tools are permanently stuck in the 90's. The only redeeming feature of this piece of shit operating system is that ZFS is built into the kernel, there's no need for DKMS faggotry.

First, you need to identify the interface names of the PCIe NIC. Just run ifconfig and note the names, in my case those were igb0, igb1, igb2 and igb3.

Then open /etc/rc.conf and add the following lines:

defaultrouter="192.168.1.1"
ifconfig_igb0="up"
ifconfig_igb1="up"
ifconfig_igb2="up"
ifconfig_igb3="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="up laggproto lacp laggport igb0 laggport igb1 laggport igb2 laggport igb3 lagghash l3,l4 DHCP"

Restart the computer to make sure everything is applied. I had little luck with just restarting netif. If everything worked you should see a new aggregated interface in ifconfig output that received an IP address from the DHCP server.

Configuring Linux

I don't remember exactly how did I set up the bond interface on Loonix, but it was similar to the following commands. Obtain the interface names from ip a output.

# nmcli connection add type bond ifname bond0 bond.options "mode=802.3ad,miimon=100,xmit_hash_policy=layer3+4"
# nmcli connection add type ethernet ifname enp4s0f0 master bond0

Repeat the last command for every interface. Then bring up every slave interface with the following command:

# nmcli connection up bond-slave-enp4s0f0

If you have issues with the xmit_hash_policy not getting applied, try to pass it directly to the kernel:

# echo 1 > /sys/class/net/bond0/bonding/xmit_hash_policy

You can view an overview of bonding configuration by issuing:

# cat /proc/net/bonding/bond0

Configuring the switch

Now that Linux and FreeBSD is all set up and running, it's time to configure the switch. The easiest way to do this is using the web interface. However, the web UI only works with Internet Explorer, because it was written in 2004. I have tried setting up LAG with the switch running the factory (1.0.0.47) firmware, only to find it doesn't work. The stock firmware supports only L2 hashing and there is no setting to change this. Through sheer luck I noted in a firmware upgrade document that the latest firmware does allow you to select different hashing algorithms.

So before you continue, make sure your 5324 is running the latest 2.0.1.4 firmware from 2010 (lol). Using TFTP to flash the firmware image and the bootloader is the quickest and least painful way to upgrade. You will lose the configuration, so back it up somewhere.

When all is done, log in to the web UI, then head to Switch -> Link Aggregation -> LAG Membership and set up LAG groups. You can define a total of 8 groups. In the following image I have defined two groups, with 4 ports each. The letter "L" above a port means that it uses LACP.

Then click on Switch -> Ports -> LAG Configuration and make sure Load Balance is set to Layer 2-3-4.

Testing the connection

Using Filezilla, I have opened two tabs and connected to the NAS on two different ports. Prior to that I've set the number of parallel transfers to 4. Downloading two large files at once you can verify whether LAG is working:

The combined speed of transfer exceeds 1 Gb. The same can be verified using glances: