In recent years, Machine Learning (ML) methods have evolved to become one of the most dynamic research areas with great impact on our current and future everyday life. Astonishing progress has been made in the application of ML algorithms to areas like speech-recognition, automatic image-analysis and scene understanding. Machine Learning enables computers to drive cars autonomously or to learn how to play video games, pushing the frontier towards abilities that have been exclusive to humans.
This development is driven by the vastly increasing usage of compute power and growth of datasets training more and more complex ML models. Hence, Machine Learning is becoming a High Performance Computing task.
In the field of activity Data Analysis and Machine Learning we denote this ongoing development by increased research activities on the scalability of large ML problems. Currently we focus on the distributed parallelization of optimization methods used to train large ML models. Our approaches, i.e. the Asynchronous Stochastic Gradient Descent (ASGD)  solver, are based on the existing CC-HPC tools like our asynchronous communication framework GPI 2.0 and the distributed file system BeeGFS.
DLPS: Deep Learning in the Cloud
Our pre-configured and optimized Caffe instances make Deep Learning available on demand. Providing custom data layers optimized for shared BeeGFS storage of large training data and models. With DLPS, we introduce a scalable and failsafe automatic meta parameter optimization for Caffe Deep Learning models in in the cloud.
Key features are:
- Automatically launching and scaling Caffe in the cloud
- Automatic Meta-Parameter Search
- Optimized data layers for BeeGFS distributed on demand storage