April 1, 2019 – by CSCS

The data collected from experiments at the Large Hadron Collider (LHC) is fed into a global computing network for data analysis and simulations. Part of this network is the “Phoenix” cluster at CSCS. Until now, the ATLAS, CMS and LHCb particle detectors delivered their data to “Phoenix” for analysis and comparison with the results of previous simulations. The “Piz Daint” supercomputer will now take over this role from Phoenix.

A unique approach
For the first time, the Worldwide LHC Computing Grid (WLCG) will integrate one of the world’s most powerful supercomputers – which is available for general research – to handle the functions of what is known as a Tier 2 system. Until now, this role has been handled exclusively by dedicated clusters distributed around the world in the WLCG. Based on their performance and characteristics, the clusters are categorised on a scale ranging from Tier 0 systems – such as those available only at the CERN data centre in Geneva and the Wigner Research Centre for Physics in Budapest – to smaller Tier 3 systems.

CSCS has been operating the “Phoenix” cluster, which accounts for about 3% of the WLCG’s capacity, as a Tier 2 system for about 12 years. The fact that “Piz Daint” will now be able to handle the data analysis of one of the world’s most data-intensive projects in addition to its daily tasks is thanks to a long-term partnership between CSCS and CHIPP. The project participants have been working on its implementation since 2014.

A host of benefits – and challenges
In addition to the advantages classic supercomputers offer physicists, such as scalability, high performance and efficiency, other properties have taken some getting used to for the high-energy physicists. “The integration of ‘Piz Daint’ with the high-energy physics data processing environments has been a highly laborious process that started with the integration of a previous generation supercomputer at CSCS,” says Gianfranco Sciacca, who is part of the ATLAS experiment. “The data access patterns and rates of our workflows are not typical of a supercomputer environment. In addition, for some workflows, certain resource requirements exceed what a general-purpose supercomputer typically provides; a lot of tuning therefore needs to be put in place.”

It was also a challenge for CSCS: “Such a move away from the status quo, with all five parties – CHIPP, ATLAS, CMS, LHCb and CSCS – sharing the same view, both on conceptual and practical grounds, was probably the biggest challenge that we have had to face in this project,” says Pablo Fernandez, Service and Business Manager at CSCS. Fernandez coordinated a whole range of activities during the project.

Migrating tasks from “Phoenix” to “Piz Daint” took place gradually over two years. Only at the end of 2017, when enough comparative data was available to make an informed decision, did CSCS and CHIPP decide to use “Piz Daint” for future data analysis. “Piz Daint” has been used actively for simulation and data analysis in LHC experiments since April 2018. In April 2019, the transfer will finally be complete and “Phoenix” will be decommissioned. Meanwhile, the project participants have succeeded in testing the externalization of the Tier 0 system onto “Piz Daint” and have provided CERN with direct computing capacity for highly complex data processing. The project participants look back at what they have achieved with a certain degree of awe: “The project was far more complex to manage than any technical limitation or problem we have had so far,” says Miguel Gila, HPC System Engineer at CSCS.

More work on the project to come
The comparison between “Piz Daint” and “Phoenix” has shown that both systems perform their functions with similar efficiency, but “Piz Daint” is slightly more cost-effective. According to the project participants, however, the CHIPP community will use only the computer nodes consisting of two CPUs for the time being, and not all the full benefits of “Piz Daint”. In future, it may be possible to use hybrid computer nodes consisting of one CPU and one GPU to run calculations much more efficiently. A big advantage for the researchers is that they can tap into much more computing power at short notice – at the “push of a button” – if needed.

The current upgrade of the LHC to the High-Luminosity LHC, which will be online from 2025 to 2034 and enable breakthroughs in particle physics, requires up to 50 times more computing power for data analysis and simulations. However, the high energy physicists predict that with the current computing models, a significant shortfall of resources will occur. This will require various software and hardware optimisations. Since HPC systems such as “Piz Daint” already have optimised hardware, the researchers are confident that it will be possible to make savings. They expect the project to serve as a model for WLCG, as it has opened doors, for instance, in terms of new computing models and software optimisation.