Machine learning research

We are investigating whether machine learning can improve traditional high performance computing. We also investigate scalable ways of training neural networks, including in the field of image recognition. This project is part of the SURF Open Innovation Lab. Machine learning involves a computer learning independently from data and input.

Part of SURF Open Innovation Lab

Improving HPC applications with deep learning

The main workloads running on a supercomputer typically consist of various forms of numerical simulations. Scientists have now started applying machine learning techniques to improve traditional computational simulations, such as weather forecasting. Initial results indicate that these models, which combine machine learning and traditional simulation, can potentially improve accuracy and speed.

In this project, we investigate whether and how machine learning and deep learning are suitable to improve, speed up or replace scientific workloads, such as numerical simulations. The belief is that as scientists become more familiar with this new approach and as methodologies become more robust, machine learning has the potential to become the standard tooling for many scientific fields.

Use cases

To test the potential of this approach, we are supporting a number of use cases where traditional HPC simulations are enhanced with machine learning algorithms. We do this in close cooperation with scientific research groups. Four research proposals have been selected in different scientific fields:

Chiel van Heerwaarden (WUR): Machine-learning turbulence in next-generation weather models
Sascha Caron (Radboud University): The Quantum-Event-Creator: Generating physics events without an event generator
Alexandre Bonvin (Utrecht University): 3DeepFace: Distinguishing biological interfaces from crystal artifacts in biomolecular complexes using deep learning
Simon Portegies Zwart (Leiden University): Machine learning for accelerating planetary dynamics in stellar clusters

Publications

Whitepaper Deep-learning enhancement of large scale numerical simulations
Presentation of the workshop 'Deep learning for high performance computing' on 15 October 2019: Deep Learning for HPC - Experiences of SURF & project partners
Article The Next Platform: Transforming HPC research with AI approaches
Blog: How machine learning can improve HPC applications
Whitepaper: Deep Learning for HPC - Experiences of SURF & project partners

Scalable high-performance training of deep neural networks

Caffe is one of the most popular deep learning frameworks for image recognition. Intel has contributed to this framework by improving the performance of Caffe on Intel Xeon processors. The goal of this project is to improve the scalability of Intel's Caffe performance on supercomputing systems for large-scale neural network training.

Our focus is on highly scalable high-performance training of deep neural networks, and its application to various scientific challenges. Think lung disease diagnosis, plant classification and high-energy physics. For example, we are working on porting large-batch Stochastic Gradient Descent (SGD) training techniques to the popular Tensorflow framework. Our particular focus is on the rapidly developing field of medical imaging. Because of the huge amounts of data, this field needs large bandwidth and capacity for computation and memory.

We have already succeeded in minimising the time-to-train of several deep convolutional neural networks on state-of-the-art computer vision datasets such as ImageNet and beyond. Some of the highlights of 2017 included: less than 30 minutes of training time on the popular Imagenet-1K dataset, as well as state-of-the-art results in terms of accuracy on other datasets, such as the full ImageNet and Places-365.

Publications

Article: Changing Course: Rethinking How AI Can Interpret X-Rays
Article: When Dense Matrix Representations Beat Sparse
Blog: Achieving Deep Learning Training in less than 40 Minutes on ImageNet-1K & Best Accuracy and Training Time on ImageNet-22K & Places-365 with Scale-out Intel® Xeon®/Xeon Phi™ Architectures
Article: Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
Conference paper: Whitepaper: Deep-learning enhancement of large scale numerical simulations
Teratec Forum 2018 presentation: Towards the recognition of the world's flora
Presentation IXPUG workshop - ISC 2018: Deep learning for fast simulation
Article: Diagnosing Lung Disease Using Deep Learning

SURF project team

Valeriu Codreanu

Damian Podareanu

Caspar van Leeuwen