December 7, 2016 - by CRAY

Running larger deep learning models is a path to new scientific possibilities, but conventional systems and architectures limit the problems that can be addressed, as models take too long to train. Cray worked with Microsoft and CSCS, a world-class scientific computing center, to leverage their decades of high performance computing expertise to profoundly scale the Microsoft Cognitive Toolkit (formerly CNTK) on a Cray® XC50™ supercomputer at CSCS nicknamed “Piz Daint”.

By accelerating the training process, instead of waiting weeks or months for results, data scientists can obtain results within hours or even minutes. With the introduction of supercomputing architectures and technologies to deep learning frameworks, customers now have the ability to solve a whole new class of problems, such as moving from image recognition to video recognition, and from simple speech recognition to natural language processing with context.  

Deep learning problems share algorithmic similarities with applications traditionally run on a massively parallel supercomputer. By optimizing inter-node communication using the Cray® XC™ Aries network and a high performance MPI library, each training job can leverage significantly more compute resources – reducing the time required to train an individual model.

“Cray’s proficiency in performance analysis and profiling, combined with the unique architecture of the XC systems, allowed us to bring deep learning problems to our Piz Daint system and scale them in a way that nobody else has,” said Prof. Dr. Thomas C. Schulthess, director of the Swiss National Supercomputing Centre (CSCS). “What is most exciting is that our researchers and scientists will now be able to use our existing Cray XC supercomputer to take on a new class of deep learning problems that were previously infeasible.”

“Applying a supercomputing approach to optimize deep learning workloads represents a powerful breakthrough for training and evaluating deep learning algorithms at scale,” said Dr. Xuedong Huang, distinguished engineer, Microsoft AI and Research. “Our collaboration with Cray and CSCS has demonstrated how the Microsoft Cognitive Toolkit can be used to push the boundaries of deep learning.”

A team of experts from Cray, Microsoft, and CSCS have scaled the Microsoft Cognitive Toolkit to more than 1,000 NVIDIA® Tesla® P100 GPU accelerators on the Cray XC50 supercomputer at CSCS. The result of this deep learning collaboration opens the door for researchers to run larger, more complex, and multi-layered deep learning workloads at scale, harnessing the performance of a Cray supercomputer.

To simplify the building and deploying of deep learning environments in supercomputing, Cray is supporting its Cray XC customers with deep learning toolkits, such as the Microsoft Cognitive Toolkit, that allow customers to run deep learning applications at their fullest potential – at scale on a Cray supercomputer. Fusing high performance computing capability with deep learning is another step forward in Cray’s vision of the convergence of supercomputing and big data.

“Only Cray can bring the combination of supercomputing technologies, supercomputing best practices, and expertise in performance optimization to scale deep learning problems,” said Dr. Mark S. Staveley, Cray’s director of deep learning and machine learning. “We are working to unlock possibilities around new approaches and model sizes, turning the dreams and theories of scientists into something real that they can explore. Our collaboration with Microsoft and CSCS is a game changer for what can be accomplished using deep learning.”

About Cray Inc.

Global supercomputing leader Cray Inc. (Nasdaq:CRAY) provides innovative systems and solutions enabling scientists and engineers in industry, academia and government to meet existing and future simulation and analytics challenges. Leveraging more than 40 years of experience in developing and servicing the world’s most advanced supercomputers, Cray offers a comprehensive portfolio of supercomputers and big data storage and analytics solutions delivering unrivaled performance, efficiency and scalability. Cray’s Adaptive Supercomputing vision is focused on delivering innovative next-generation products that integrate diverse processing technologies into a unified architecture, allowing customers to meet the market’s continued demand for realized performance. Go to www.cray.com for more information.