postmarketOS // Hardware testing automation: a status update

We have been talking about hardware testing and CI integration for quite a while. It featured in our FOSDEM's blog post, and also in our ongoing focus for the year. Since then, there has been a considerable amount of work put into related tasks, part of which will be paid with donations' money!

Why hardware-testing automation?

As the postmarketOS community grows, so does the amount of devices supported, and the number of working features each of them have. However, manual testing cannot scale at the same pace. Once devices keep maturing, it is not only about basic features, but we also want to make sure that all systems work reliably. Testing that connecting to WiFi works after 10 reboots becomes unfeasible quite fast. We need a solution for this problem, and we believe running automated testing on real hardware can greatly help us here. Indeed, we had already done a previous attempt. Even if it didn't result in a production-ready system, it provided some ground work and greatly helped inform the requirements for this second attempt.

Setting the requirements

So we know we want a system that allows us to automate tests of device features. But which should be the specific requirements? Any system we develop or pay for should:

Help ensuring reliability of specific functionality on a variety of devices. It should not be tailored to a single device or device family.
Enable a crowd-sourced and decentralised board farm in the long-term, supporting scaling-up testing. This has some further implications:
Documentation should be available for people to replicate the system.
We should aim to lower as much as possible the barrier of entry. It should not be needed to have a several-thousand-Euro lab to get started.
No dependency on devices owned by a limited amount of infrastructure owners.
Be FOSS, and allow to be reused by other FOSS communities. It should not be a postmarketOS-only system.
Reduce friction for contributions through easier and broader automated testing. It should be possible for people to run testing on other repositories than postmarketOS approved ones (given they have access to the hardware).
Be safe to run at scale, so that building a farm for 8 devices is not exponentially harder than building a farm for 2 devices.

Overall, a desired solution would mean something like this:

From requirements to development

From our first attempt some years ago we learned two important lessons: such a project is less likely to succeed if done by one single person, since it requires knowledge in quite some different fields; and designing such a test setup requires a lot of previous experience, and should be done by somebody senior in the field. These lessons lead to splitting the project into 2 parts

The hardware design

To be able to run the devices at scale, we need to be able to control (flash, power on, off, push buttons) devices through software, and to design a PCB that would support controlling devices from a rack without batteries to avoid them getting on fire.

The software control was solved for Android devices in the previous project with custom firmware on a Raspberry Pico microcontroller. Other kind of devices might need different firmware, but this would be a start.

For the PCB side, we have been trying to develop one for a while. We went over multiple iterations and ended up with a requirements document. The initial PCB is now ready for a first test assembly order, and will soon be in the hands of Federico and ncorna! We expect multiple iterations of adjusting the design and re-ordering to verify improvements. Once that is done, the firmware will need to be adjusted to the current PCB design.

Although it will still take a while, with these two components, we believe we have a solid ground on top of which to start connecting devices at scale.

The software CI integration

Unfortunately, there are not that many FOSS systems to integrate CI for real-world devices. And we neither have the money to pay somebody skilled to develop one for us. So we are left with little choice. One of the most famous in the Linux world is LAVA, but real-world experience from some of our developers lead to us considering it not fitting to our needs. Another alternative is CI-tron. Although less widely-used than LAVA, it has seen production usage in Mesa CI (possibly one of the most complex FOSS CI ever built) by Valve for their SteamDeck. We also learnt that it was slowly being adopted by other companies in Mesa while they phased out custom HW testing infra. It seemed to fit many of our requirements, so Casey and Pablo had a meeting with the maintainer, Martin, who explained the architecture and demoed it for us. We were very satisfied with how it fit the requirements. We knew there would be some changes needed to get it working for us, but paying for it would fit within our budget, and at the end we would have a reliable system, relatively easy to configure, and built for very straining requirements!

The CI-tron project

A project running under CI-tron consists of 3 different parts:

Device Under Test (DUT): which are the platforms on which the tests run. These the phones we are interested in, but can also be QEMU VMs, so we do not need phones to actually run basic tests.
CI-tron gateway: a complete operating system image which packs all CI-tron services into a secure and ready-to-deploy system. It communicates with GitLab, fetches GitLab jobs, downloads all artifacts and images to run on devices, talks to and controls the DUTs, etc. It is a great beast.
Power Deliver Unit (PDU): an intermediate component between DUTs and the gateway that allows the gateway to control the devices in software. This is the role that our PCB is playing in the system.

Despite the complexity of a system like this, from the user perspective, only DUTs exist. The DUTs show up in GitLab just as any other GitLab runner. They simply get tagged with information related to their hardware. For example, a OnePlus 6 could have tags cpu:arch:aarch64, soc:sdm845, and compatible:oneplus-enchilada. To run jobs on that device, it would be as simple as setting the appropriate tags in the GitLab CI! The gateway takes care of the rest and makes the setup completely transparent. This is one of the greatest strengths we found in CI-tron.

The paid work

Getting the CI-tron project ready for our needs also needed highly-specialized work that we know from experience would be hard to do just with volunteers. So we talked with Martin, who was available to provide some support, and recommended Samuel to fill-in the technical gap. Although it is always a bit more risky to pay people not already part of the team, we all have enough to do, and getting outside support is a great way to move things forward. To make sure things stayed on track Pablo acted as a project manager providing feedback from the postmarketOS side and making sure the project fulfilled our requirements. He was also tasked with writing this blog post.

The project we commissioned had the following goals:

Get our CI ready to test on hardware merge requests that might benefit from it. Our main target here was to test initramfs and kernel changes in pmaports. It is far from testing everything, but it would be a solid start.
Allow testing both in QEMU and in real hardware. Since timing between the PCB design and the CI-tron project did not work as desired, this was later scaled down to just QEMU. The setup with real hardware is roughly equivalent and can then be done as a follow-up once the PCB is ready!

In the project plan, Martin was in charge of providing recommendations how CI-tron should be changed to allow testing the initramfs, provide a sample project, and write documentation on how to create a production-ready CI-tron setup. He was also asked to support Samuel through the development.

Martin used this time to tackle long-term wishes in CI-tron, resulting in a better project for all. We can highlight documentation improvements that resulted in much better installation workflows. He also overhauled considerable parts of the GitLab integration that resulted in far more detailed documentation to run workflows in CI-tron, including the initramfs workflow we commissioned.

Samuel was tasked to do the the bulk of the work on the postmarketOS side of things: changes in pmaports and ci-common repositories to run CI-tron tests when initramfs changes, and documentation on how to develop further tests. The bulk of the work would be the dynamic selection on tags so that we could decide to run different jobs on different devices. He was also tasked to do needed CI-tron changes for our initramfs, which in the end resulted in providing Martin with detailed architectural feedback to support postmarketOS-specific workflows.

The work of Samuel has started rolling into our GitLab, but has gotten a bit delayed due to personal reasons. However, the core of the tasks should be ready for some last touches before creating MRs. Otherwise, Samuel has also considerably contributed at architecting the changes that Martin worked on for a better integration with low-level testing. Samuel also pushed for a more modular design that could benefit not only postmarketOS, holding the upstream mentality we like to foster. That resulted in better interface changes by Martin, which are as of now already in use in Mesa! He has also polished multiple small things in CI-tron while setting up the project.

Finally Oliver deployed CI-tron to our infrastructure as part of his automated testing improvements work for the upcoming v25.06 release of postmarketOS, which is funded by NGI Zero Core within the NLnet postmarketOS daemons project. Even though there was now a lot of documentation, deployment was not a trivial task, since we were the first who tried to deploy CI-tron on a server on the Internet instead of having it on a local machine with full hardware access. We are glad to report that everything is working now, we have 4 QEMU runners ready to be used and can add more and remove existing ones by tapping a few buttons in the CI-tron dashboard.

The project was initially budgeted to be between 13.000€ and 15.000€. However, it will likely stay a bit below that line. The main reason is the scale-down of the project to not include real-hardware testing (we will do that afterwards, outside the scope of the project at hand). For the parts that we ended up executing we went a bit over-budget due to the additional work done in architecting CI-tron. We are happy to have contributed to that effort, and that some of our money can support other projects adapting hardware testing in the future.

Conclusion

We are very happy that the hardware setup keeps moving forward. We are also very happy that for the first time we have spent a significant amount of the donations we received on a project of that scale that will very directly contribute to make postmarketOS more reliable. Invoices for finished work will start appearing in Open Collective in the next weeks. We are very thankful to everybody that has contributed to this project, including less visible work like reviewing and approving projects milestones by Luca, Stefan, and Oliver.

If you would like to see postmarketOS become more reliable and appreciate the work we are doing towards this, consider donating via our OpenCollective!