Scalable Algorithms

In this field of activity we work on new algorithms for distributed parallelization of optimization methods used to train large Machine Learning models.

Data Analysis and Machine Learning

In recent years, Machine Learning (ML) methods have evolved to become one of the most dynamic research areas with great impact on our current and future everyday life. Astonishing progress has been made in the application of ML algorithms to areas like speech-recognition, automatic image-analysis and scene understanding. Machine Learning enables computers to drive cars autonomously or to learn how to play video games, pushing the frontier towards abilities that have been exclusive to humans.

This development is driven by the vastly increasing usage of compute power and growth of datasets training more and more complex ML models. Hence, Machine Learning is becoming a High Performance Computing task.

In the field of activity Data Analysis and Machine Learning we denote this ongoing development by increased research activities on the scalability of large ML problems. Currently we focus on the distributed parallelization of optimization methods used to train large ML models. Our approaches, i.e. the Asynchronous Stochastic Gradient Descent (ASGD) [1] solver, are based on the existing CC-HPC tools like our asynchronous communication framework GPI 2.0 and the distributed file system BeeGFS.


DLPS: Deep Learning in the Cloud

Our pre-configured and optimized Caffe instances make Deep Learning available on demand. Providing custom data layers optimized for shared BeeGFS storage of large training data and models. With DLPS, we introduce a scalable and failsafe automatic meta parameter optimization for Caffe Deep Learning models in in the cloud.

Key features are:

  • Automatically launching and scaling Caffe in the cloud
  • Automatic Meta-Parameter Search
  • Optimized data layers for BeeGFS distributed on demand storage
DLPS: Deep Learning in der Cloud
DLPS: Deep Learning in the Cloud

SGD and ASGD - Scalable Deep Learning build on HPC Technology

Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning algorithms. In the context of large scale learning, as utilized by many Big Data applications, the efficient parallelization of SGD on distributed systems is a key performance factor.

We offer scalable implementations of state of the art synchronous SDG algorithms for the distributed CPU and GPU based training of large Caffe models on HPC infrastructure. With our Asynchronous Stochastic Gradient Descent optimization algorithm (ASGD) we introduced a new algorithm, that is able to efficiently parallelize SGD on distributed filesystems. ASGD outperforms current, mostly MapReduce based, parallel SGD algorithms in solving the optimization task for large scale machine learning problems in distributed memory environments. We were able show, that ASGD is faster, has better convergence and scaling properties and leads to better error rates than other state of the art methods. With ASGD, non-convex optimization problems in high-dimensional parameter spaces can effectively be parallelized over hundreds or thousands of CPU and GPU nodes.

Our version of Caffe is build on top of our HPC Core-Technologies:

  • the asynchronous RDMA bases communication in GPI-2
  • automatic parallelization
  • data and workflow management within GPI-SPACE
  • scalable distributed filesystems on demand with BeeGFS

 All of our HPC-Tools are Open Source.

Example Projects



With the open source multi-user software stack Carme, several users can manage the available resources of a computing cluster.



With our software tool TensorQuant, developers can now simulate Deep Learning models and thus significantly accelerate the development.



Further information on the project SafeClouds on our project page »Distributed Infrastructure for Data Analysis in Aviation«.


Fraunhofer-Cluster of Excellence CIT

Cognitive Internet Technologies

The cluster focuses on the three main topics »IoT-COMMs«, »Fraunhofer Data Spaces« and »Machine Learning«.


BMBF-Projekt »High Performance Deep Learning Framework«

The project provides an easy access to current and future high-performance computing systems.


  • Keuper, Janis; Pfreundt, Franz-Josef, Asynchronous Parallel Stochastic Gradient Descent: A Numeric Core for Scalable Distributed Machine Learning Algorithms,Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments;, ISBN 978-1-4503-4006-9. Publisher: ACM.
  • Keuper, Janis; Pfreundt, Franz-Josef, Balancing the Communication Load of Asynchronously Parallelized Machine Learning. CoRR, abs/1510.01155.
  • Janis Keuper and Franz-Josef Pfreundt. 2016. Distributed training of deep neural networks: theoretical and practical limits of parallel scalability. In Proceedings of the Workshop on Machine Learning in High Performance Computing Environments (MLHPC '16). IEEE Press, Piscataway, NJ, USA, 19-26. DOI: