Machine Learning

Our department is working on solutions for different industrial issues via machine learning. Both, in the field of supervised and unsupervised learning significant experiences in several industrial projects were gathered.

Our main instruments among others are:

  • characteristic selectors based on Shannons information theory
  • Neural networks
  • Bayesian nets
  • Chow-Liu nets
  • Markov random fields
  • subspace clustering methods

Deep Learning

The commitment of methods for Deep-Learning respectively the usage of deep architectures in representational graphs from functions are motivated as follows: theoretical results have shown that function families (high level abstractions) e.g. in object recognition, speech recognition or in text mining exist, whose deep representations are exponentially more efficient than shallow representations.

If a function family with less parameters can be approximated (smaller VC dimension), the learning theory (Vapnik 1998) says that less data points are required. This has advantages both in evaluation efficiency (less neurons) and in the statistical efficiency (less parameters to be estimated, multiple usage of the same parameters for various inputs).

Hastad(1986) has shown that O(2d) parameters and points are needed to approximate the so called parity function with d dimensional input by means of traditional machine learning algorithms e.g. Gauss SVM's or FF neural networks. To approximate the same function with a deep neuronal network, O(d) parameters and neurons in O(log2 d) hidden layers are required.

Till 2006 several practice sessions with multilayerd neural networks lead to poorer results (local minima, saddle points, overfitting…) in the gathered test data than shallow neural networks (with 1 or 2 hidden layers). This changed with the discoveries and works of Hinton (2006) and Bengio (2007), who implemented the greedy layer wise pre-training algorithm:

First, every layer of the model is being identified and trained via unsupervised learning (Representation Learning). Thus, every representation of a single layer functions as the input of the next layer. Finally the parameters of all layers are being fine-tuned by means of supervised learning via back-propagation for the purpose of e.g. classification.

Representation Learning

Representation Learning (RL) covers a collection of methods which generates automated representations from an input vector and thus allows supervised learning such as classification. Deep-Learning methods are RL-methods with multiple layers in which each layer transforms the previous representation into an abstract level starting with the raw data..

In other words: Each RL-layer of a deep neural network tries to generate characteristics which can be classified more easily originated from the validity of the manifold hypothesis (complex data manifolds are in itself low-dimensional). This means, the variations along the manifolds will be detected and the orthogonal variations of the tangent space will be ignored. Furthermore Bengio (2013) has shown that the RL-method "Auto-encoder" is able to unbundle complex data manifolds in each layer of the deep neural networks.

 

RL Approaches

At this time two parallel approaches exist for RL:

  • the first has its origin in probabilistic graphical models; main representatives are "Restricted Boltzmann Machines (RBM)",
  • the other in neural networks, main representatives are the so called "Auto-Encoder".

Equivalent superordinate methods are "Deep Belief Networks" and "Deep Neural Networks". The learning algorithm for the RBM, called  "constructive divergence" (CD)  allows for a incremental update from batch to batch, similar to the stochastic gradient descent. Here the choice of the so called hyper-parameter plays a vital role. We use "Sequential Model Based Global Optimization" by means of Gauss processes and depending on the application by Monte Carlo Markov Chain (MCMC).

Other Publications

  • Y. Bengio, et. Al. Representation learning: a review and new perspectives: IEEE transactions on pattern analysis and machine intelligence. 2013.
  • Y. Bengio, et. Al. Greedy layer-wise training of deep networks, NIPS 2006.
  • G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 2006.
  • V. N. Vapnik, Statistical learning theory. 1998.
  • J. Håstad, Almost optimal lower bounds for small depth circuits. STOC 1986.