January 24, 2022 - by Santina Russo
Maxime and Alberto, why should users and HPC providers work with Sarus?
Alberto Madonna: Sarus is a container engine that improves the performance of portable containers for HPC computing. It provides users with a straightforward way to adapt containers to the system they are run on in order to fully tap its power. In fact, with Sarus, a once created container image, which is something like a blueprint for containers, can be used on various machines within the same processor family — from a laptop to small computer clusters and all the way up to HPC facilities.
Maxime Martinasso: This capability is actually what sets Sarus apart from all other available container solutions for HPC. In the traditional approach to containers in HPC, to fully leverage a supercomputer’s performance, users build a container image of their software stack specifically for the machine they want to access. If they want to change machines and harvest their full performance, they create a new container image. This, however, goes against the nature of container technology itself. It means forgoing one of the main advantages that containers are providing, namely easy transport of software between systems by decoupling the software stack from its environment. Sarus, on the other hand, achieves portability without compromising on performance. This is unique.
This sounds as though the usage of containers might be somewhat misunderstood in the community.
MM: Yes, in the HPC community it certainly is. There is this misconception that containers always have to be built on the HPC system they will run on. So, more often than not, container technology is viewed as just another tool for packaging software rather than as a technology offering portability among systems.
AM: I often observe that in HPC, container engines are still valued mostly as a means to achieve reproducibility. This is, of course, an important requirement: Researchers need their simulations to run the same regardless of the machine they are using, and the sealed environment of containers helps with this. But in the process, users lose performance if their container engine does not use the system’s hardware and architecture optimally.
How does Sarus achieve this balancing act — portability as well as native performance?
MM: In a way, the beauty of Sarus is its smart way of outsourcing tasks by using external plugin programs, namely open standard OCI hooks. These hooks can customize containers to unlock machine-specific HPC features at runtime. Some OCI hooks we developed ourselves, while others are provided by HPC vendors. Sarus then provides the link between standard portable container images — created, for instance, by the popular container solution Docker — and the OCI hooks that enable access to an HPC system’s hardware and architecture, its filesystems, accelerators, libraries and workload manager.
AM: For instance, one of the OCI hooks provides access to a machine’s Nvidia GPUs, another provides the libraries that take care of the communication between different running processes, and so on. This way, containers can tap native performance, while the container images themselves remain untouched and portable. This container customization to a specific HPC environment happens on the fly at runtime.
Who can profit from Sarus the most?
AM: Sarus generally makes life easier for HPC users — on the one hand for those who want to start adopting containers, on the other hand for scientists who are new to HPC altogether. But it is probably particularly useful for researchers who are using very complex software stacks, such as large codes that are highly specific, unique even, for a certain research field and that have a multitude of dependencies to other programs and libraries. When employing containers and Sarus, users can re-use their customized software stacks on different computing environments without having to compromise on performance.
MM: The same incidentally applies for machine learning environments and cloud applications. Although these kinds of applications were not built for HPC, with Sarus they can easily be adapted to use the full potential of supercomputers. This was actually one of the main drivers for developing Sarus.
AM: Another important point is that, up to now, industry standards were hardly ever adopted in HPC container solutions. With Sarus, we wanted to change this. With its usage of OCI hooks, Sarus builds on industry standards and vendor provided code, making life easier for HPC support teams.
Can you give some examples of projects that use Sarus?
AM: Sure. For instance, the artificial intelligence benchmarking organisation MLPerf.org recently performed its first HPC training at a selection of facilities worldwide. These benchmark runs were intended to measure machine learning (ML) performance on large-scale HPC systems. For the runs on “Piz Daint”, the CSCS team employed Sarus to rapidly deploy and run CosmoFlow, a ML application to predict parameters of the universe, and thereby easily accessed full system performance.
MM: Another example is the work we did together with CERN. To perform the elaborate computations of particle collisions, CERN works with a certified software stack on a variety of HPC systems. Here, container technology comes in very handy, and CERN has deployed a multitude of containers per day on “Piz Daint” at CSCS using Sarus. Also, Amazon Web Services are very interested in Sarus, and already held workshops promoting its use within their own HPC infrastructure.
Actually, talking about US giants like Amazon: In contrast to other popular container engines for HPC, Sarus is a European product. Do you see an advantage in this for users?
MM: It’s certainly easier for European users to get in touch with us if they have questions or concerns. In addition, Sarus is not proprietary to a company, but based on open-source philosophy and open-source technology. This comes with advantages for the users, also because of the way Sarus is built: It is modular, meaning that HPC providers and developers can contribute to it or reconfigure it for their purposes. For instance, providers can easily swap the core module of Sarus to benefit from security updates from the OCI community.
What are your next steps to improve Sarus?
AM: We are currently working towards making Sarus more accessible by providing official packages for popular HPC package managers like Spack and EasyBuild. Currently, a standalone package is available on the product website and on GitHub. Also, in the course of the next year, we plan to implement more updates and develop additional OCI hooks to provide more features. For instance, we will add a hook for AMD GPUs, which will be part of the next generation CSCS machine “Alps” as well as “LUMI”, one of the new European flagship HPC systems being built in Finland.
MM: Our aim is to continuously make Sarus and its concept of providing performance and portability more diverse and complete. In addition, we have already started to promote it to users and HPC facilities in Europe.
With what means are you promoting Sarus?
MM: We have taken part in different workshops and events in Europe and the US, and we regularly hold webinars to present and explain the idea and capabilities of Sarus. Plus, we have started discussions with other European HPC facilities with the aim to make Sarus available to more European users. Also, Sarus should naturally be considered in the idea of a European HPC software stack, since it is answering needs that no other European product in the stack is addressing. We are willing to contribute to such an idea. In addition, we are talking to vendors of HPC machines. If vendors recognise the possibilities of Sarus for HPC facilities and their users, they will increasingly add it to their software stacks. This will go a long way toward making Sarus and its usage more ubiquitous.
Image above: Alberto Madonna and Maxime Martinasso (Image: Matteo Aroldi, CSCS)
More information:
Sarus is a user-friendly container engine for HPC systems that connects the portability of containers with theperformance of supercomputers. It uses open standards and a similar command line interface as the leading non-HPC container engine Docker. The latest Sarus release was issued in November 2021. Please find further information on the product’s website.