July 1, 2020 - Interview by Simone Ulmer
Nur and Andreas, you both work as software engineers at CSCS. What is your job there?
Nur & Andreas: Our job at CSCS is to build libraries that facilitate easier and faster developments of codes with our partners at the universities. We focus our work on providing an easy interface to a complex hardware architecture, so that scientists can concentrate on the correctness of their algorithms while we focus on the execution time of these algorithms on different hardware architectures.
How did you become a software engineer? What kind of background do you have?
Andreas: I studied applied mathematics in Kaiserslautern, Germany, which is a mixture of mathematics and computer science. During my time as student, I was working as a part-time software engineer in one of the research centres. This awakened my curiosity in this field, and I am still fascinated by it.
Nur: I studied physical engineering at Politecnico di Milano. During my thesis, I discovered that I like to develop software. Subsequently, I remained at Politecnico di Milano and completed a PhD in applied mathematics and HPC.
Software engineering has little to do with geothermal energy or earthquakes. Do you have to learn the basics or are the underlying codes and algorithms generally applicable to all research disciplines?
Nur: We didn't need to learn anything about geology. To optimise the code, we only had to learn a bit about the numerical side, about the algorithm used in the software. I have a small background in geology, because I helped on a similar software during my PhD thesis, but background knowledge was not really relevant. In the end, we were dealing with matrices to optimize the software.
Andreas: From my side, very similar to Nur, I had zero experience with geothermal codes when I was working on this code. I didn’t really need to know that this is a geothermal code, because I was working on a higher level in optimising algorithmic structures and looking at where things are slow. Here it is absolutely irrelevant what kind of code it is, because the tools we use are always the same. There are general algorithmic structures that we can use and replace in codes, independent of whether it is a geothermal code ore some chemistry code. There are a couple of, let’s say building bricks, that we can stick together, and they apply to problems of different fields. I mean, it is relevant what type of problem it is, but, even from very different fields, the type of problem can be the same.
How did you get involved in this project?
Nur: These little bricks, for example, are software libraries. They are compiled from different projects ranging from chemistry, to geology to hemodynamic. One of these libraries is called Trilinos, in which I am an expert and the reason why I came to this project.
Andreas: I got involved in this project because I was already working in optimization of codes.
And code optimization seems to have been the most important thing in the project.
Andreas: Exactly.
What exactly was the problem you had to solve?
Nur: The code was serial, wasn’t parallel, and was using a very old version of Trilinos. We split the duties; Andreas was working on optimisation mainly, and I was working on porting to the new infrastructure of Trilinos that has the possibility also to run on GPUs, because Piz Daint Supercomputer is a machine that mainly consists of GPUs. It is crucial now and for the future to run the code on GPUs.
Andreas: I replaced the data structures that were used within the code with more appropriate ones. The former data structures were correct from a functional point of view, but they were a bad choice from an algorithmic point of view.
How long did you work on the project?
Andreas: Both of us in total a bit less than one year.
What were the special challenges?
Nur: For me, one of the most challenging things was to map the old version of the code to the new interface of Trilinos and to port it to the library Utopia.
Andreas: For me, the biggest challenge was not to break the existing code. I changed a lot of things, and it wasn't clear whether the changes I made would break any behaviour of the code or disrupt the results of the simulation, since there weren't many tests. I spent quite some time together with Dimitrios Karvounis from ETH in figuring out how we can make sure that changes we are committing are not changing the behaviour, or if we change the behaviour that the outcome is still an acceptable one.
How can you test this?
Andreas: We are running simulations, and basically, we figure out if the quality outcome of the simulation is in some expected range. One part of the problem is that it’s a nondeterministic problem. There are a couple of unknowns, and these unknowns are modelled by random numbers. If we are using random numbers, we are getting different results for different simulations. That’s why we need the expertise of the scientists of ETH to confirm that the outcome is realistic and something that can happen in the real world.
I suppose this requires a close exchange with the researchers? Who was your contact person and how did you proceed?
Andreas:I worked basically with Dimitrios on the code, because he was the original author.
Nur: My main anchor point was also Dimitrios. He is the expert of the code and developed the relevant part of it. The code is the basis of Dimitrios’ PhD thesis.
How much could you improve the code in terms of how much faster it computes now?
Andreas: Well, in the beginning one simulation took about eight hours, and now we are down to two minutes. We measured the speedup, and the code is 600 times faster now. It came down in a lot of steps in between. We did at least ten to twenty iterations, during which the code became faster and faster, until there was no further benefit in additional optimizing the code.
And you can be sure that the code running 600 times faster works well and reflects the real world?
Andreas: Exactly. By the quality of the results, we are at the same level as before.
What was the most exciting aspect of the PASC project for you?
Nur: I was very surprised about this performance gain. When we started working, we thought there was the possibility to improve the performance for sure, but I never thought that we could have such a big improvement. I was very excited at the end that they are testing the simulation in a real case, and the fact that this project could have a real benefit for society. If it works, everybody would benefit from this, and Switzerland could also reduce our national CO2 footprint. This means that some part of the energy that we are using for our simulations could come from this type of geothermal power plant.
Andreas: For me, that’s also basically the most exciting part of it, to see that it actually is used for real-world examples, where they actually try to make use of it, and it is not only something theoretical that we are working on.
It sounds as if you often lack a real application in your work. Isn't this somewhat at odds with the broad application of simulations in science?
Andreas: Very often our work is still on a theoretical level. I mean, it is always a question what you consider a real-world application, right? What happens very often is that things we are improving are used for further research, and it can take quite some time until it actually reaches a point where you can see that it is useful for real application. Usually you don’t have such a short time from development to real-world application.
What are you working on at the moment?
Nur: We are both working mainly on PASC projects — actually, on the project HPC-Predict, which means High-Performance Computing for the Prognosis of Adverse Aortic Events. It is a project led by Dominik Obrist, Professor at University of Bern; and the other one is StagBL (A Scalable, Portable, High-Performance Discretization and Solver Layer for Geodynamic Simulation), a project led by ETH-professor Paul Tackley. We are using the software library Utopia on all the projects, as I mentioned before. We are trying to create a repository were a multipool software used in exascale computing can live together and interact together to give the user the possibility to use different libraries.
Andreas: Basically, Nur and I are working on the same projects. In the HPC-Predict project, I am also working on providing a kind of encryption so that we can work with medical data on Piz Daint without revealing the personal patient data. We need to make sure that the data is encrypted all the time while still being able to run simulations on the encrypted data. This code is crucial for all medical simulations and could be applied for all types of simulations that need data protection. Our work at CSCS is that we develop building bricks, and inside the project, we just stick together the bricks we have and find a solution. If there is a brick missing, then we develop it for this project additionally, but with the idea that it can be reused for other projects too. We try to provide general solutions.
Further Information:
The two software engineers Nur Aiman Fadel and Andreas Fink from CSCS supported the researchers from the SED and ETH Zurich in solving the problem of code optimization. Find out more about the scientific goals in the article about the project FASTER >