RK3588 Cluster Part 1: Project planning and hardware comparison

Dec 19th, 2023

I’ve been having some issues with my home Kubernetes cluster lately. One of my servers has been consistently triggering a kernel panic on reboot, one is constantly OOM killing processes, and my rack sucks an enormous amount of power at idle.

My rack draws a mean of 900W consistently, costing about $100 per month

I have also been neglecting my updates maintenance over the past year or so. I use a combination of Renovate, GitHub, and Flux to manage automatically deploying updates to the services I host, but breaking changes require manual intervention. Rather than trying to update my existing stack, I’ve decided to replace (most of) it.

Project scope

Before getting into the fun technical details, I first need to identify what I want to accomplish, and what my constraints are. In no particular order, I want to:

I have several constraints that affect the design decisions for this project:

New cluster, new hardware

I’ve been researching options for the nodes for a few months now. Most machines with a x86 processor eat a ton of power, and those that don’t are either very slow or very expensive compared to other options. There are some options on eBay for cheap “micro” PCs such as the Dell Optiplex series, but none of the options I saw could be retrofitted with a 5Gbps (or faster) NIC, 32 GB of RAM, and two drives that meet my requirements. Used servers are also not an option due to size, aesthetic, and noise requirements. A stack of laptops is also not terribly pleasing to look at, and there are few that would meet my other requirements. This primarily leaves ARM64 single board computers (SBCs).

ARM processors have historically trailed very far behind x86 processors in raw performance. However recent advances in ARM64 CPUs actually beat high end x86 processors in some workloads in some sectors. While purchasing an AWS Outposts rack with Graviton processors is not remotely feasible for several reasons, there is a correlation between these advances and advances in ARM64 processors used in other applications.

The Rockchip RK3588 System on Chip (SoC) is the latest and greatest in economical SBC processors. While there are other, faster options, they are either only available in expensive, locked-down products, or are ultra-expensive and specialized1. The RK3588 SoC touts four A76 cores, four A55 cores, has a GPU with transcoding support, quad-channel RAM, two 2.5Gbps MACs, and several PCIe and SATA Ⅲ busses. While the SoC doesn’t inherently address most of my requirements, it does at least support them.

I found several options for RK3588-based SBCs. Here’s a comparison I made of their features2:

Name Price RAM NICs M.2 Slots SATA III eMMC
ROCK Pi 5B $189 16 GB LPDDR4 1x 2.5GBase-T 1x M key PCIe 3.0 x4,
1x E key PCIe 2.1 x1
1x via M.2 E key slot Proprietary socket, not included
DB3588V2 $229, minimum, very unclear 32 GB LPDDR4 2x 1000Base-T 1x M key PCIe 3.0 x2 2x standard port 256 GB eMMC 5.1
BPI-RK3588 $160 8 GB LPDDR4 1x 2.5GBase-T 1x M key PCIe 3.0 x4,
1x E key PCIe 2.1 x1
1x via M.2 E key slot 32 GB
ROC-RK3588-RT $319 32 GB LPDDR5, very unclear 1x 2.5GBase-T,
2x 1000Base-T
2x, one E key, very unclear 1x via M.2 E key slot 128 GB
Orange Pi 5 Plus $189, presale price 32 GB LPDDR4X 2x 2.5GBase-T 1x M key PCIe 3.0 x4,
1x E key PCIe 2.1 x1
None Proprietary socket, not included
Turing RK13 $260 32 GB LPDDR4 1x 802.3ab MDI 4 1x M key PCIe 3.0 x4 2x standard port, only on node 3 32 GB eMMC 5.1
Blade 35 $439 32 GB LPDDR4 2x 2.5GBase-T,
1x 16 Gbps custom TCP/IP over PCIe implementation
1x M key PCIe 3.0 x2 1x standard port 256 GB eMMC 5.1

Every option has it’s pros and cons. In the end, two options stood out to me: the Turing RK1 and the Blade 3. Both of these boards can attach to a larger cluster board (optional for the Blade 3) for additional features. This results in a much cleaner, less “jury-rigged” end result than a bunch of bare boards stacked up with wires coming out and running everywhere. From a more technical perspective, both options have a basic baseboard management controller, which supports things like switching the nodes on and off, and remotely accessing the nodes as if physically connected to them. The key differentiator between the two options is that the Blade 3 has a really cool feature when installed in a Cluster Box, the Blade 3’s clustering solution. When installed, the Blade 3 nodes can communicate via a network that uses PCIe + software for layer 1 of the stack, instead of 802.3 Ethernet over copper/fiber:

Block diagram from the Cluster Box page
Block diagram from the Cluster Box page

Unfortunately this does have some drawbacks, which I’ll cover later.

Despite the high price (compared to the other options), the Blade 3 is the clear winner for my specific use case. The Turing RK1 comes close to matching it in features, but the network link between nodes is just too slow. Additionally, the lack of SATA port for each node is a major drawback. If I didn’t have this requirement then I probably would have used the Turing RK1, or the Orange Pi 5 Plus for my nodes. Instead, I’m moving forward with four Blade 3s installed in a Cluster Box.

An interesting alternative

One other option that I saw was the Zimablade. Cringy website aside, it actually meets most of my requirements. With this option I could lax my RAM requirements by purchasing additional semi-dedicated nodes for in-cluster storage due to the low price. However, this does have drawbacks in that the out of the box form factor is awful for a cluster setup with a full PCIe card installed. Making these look halfway decent would require designing and manufacturing an enclosure for them. I may revisit these at some point in the future if I want some x86 nodes and/or nodes with Intel Quick Sync, but for now they are not an option.

Undocumented (until now) Blade 3 information

While the Blade 3 has significantly better documentation than most of the other options I evaluated, there is still some technical product information that is (to my knowledge) not yet documented anywhere online. Prior to placing my order, I reached out to the company for clarification on a few things. Here’s what I asked, and what Mixtile told me:

Note that some of the questions and responses have been minorly altered (formatting, context, removing unrelated/irrelevant information) where appropriate.

Some documented, harder to find information

Somebody else evaluating the the Blade 3 might find some of this useful, although it is currently documented elsewhere:

Blade 3 drawbacks

I’ve covered just about everything I’ve researched regarding the Mixtile Blade 3 and Cluster Box. I’ve outlined neat features and benefits of the products, so now I’d like to share some of the downsides that I’ve noticed.

Note that at the time of writing I have the products in hand, but I have not even powered them on yet. I may find additional issues as I begin working with them, or I may find that some of the problems I currently think exist, don’t.

Wrapping up

While the Mixtile products don’t appear perfect, they seem pretty great and I’m excited to move forward with them. I’ve already ordered a fully loaded Cluster Box and have it in hand, and unboxed. In the next post I’ll share the photos I’ve taken and list a couple of things that I’ve noticed that aren’t really obvious until you hold the units in your hand.

  1. The last time I checked, Zync MPSoCs and RFSoCs cost 5 figures for the chip alone. 

  2. I found a handful of other options but did not include them due to lack of pricing and/or information. 

  3. The specs for this are listed as if was installed in a yet to be released Turing Pi 2. This item alone is just a System on Module (SoM)! 

  4. This is basically a fancy way of saying 1000Base-T but with a board-to-board connector. 

  5. The specs for this are listed as if installed in a Cluster Box. There are other configurations available if using a Breakout Board or Blade 3 Case

Tags:

[RK3588]

[Kubernetes]

[Cluster Computing]