Autoencoders, Tied Weights and Dropout¶
Training deep neural networks (i.e., networks with several hidden layers) is challenging, because normal training easily gets stuck in undesired local optima. This prevents the lower layers from learning useful features. This problem can be partially circumvented by pre-training the layers in an unsupervised fashion and thereby initialising them in a region of the error function which is easier to train (or fine-tune) using steepest descent techniques.
One of these unsupervised learning techniques are autoencoders. An autoencoder is a feed forward neural network with one hidden layer which is trained to map its input to itself via the representation formed by the hidden units. The optimisation problem for input data \(\vec{x}_1,\dots,\vec{x}_N\) is stated as:
Of course, without any constraints this is a simple task as the model will just try to learn the identity. It becomes a bit more challenging when we restrict the size of the intermediate representation (i.e., the number of hidden units). An image with several hundred input points can not be squeezed in a representation of a few hidden neurons. Thus, it is assumed that this intermediate representation learns something meaningful about the problem. Of course, using this simple technique only works if the number of hidden neurons is smaller than the number of dimensions of the image. We need more advanced regularisation techniques to work with overcomplete representations (i.e., if the size of the intermediate representation is larger than the input dimension). But especially for images it is obvious that a good intermediate representation must be somehow more complex: the number of objects that can be seen on an image is larger than the number of its pixels.
Shark offers a wide range of possible training algorithms for autoencoders. This is the basic tutorial which will show you:
- How to train autoencoders
- How to use the :doxy:`Autoencoder` and :doxy:`TiedAutoencoder` classes
- How to use the dropout technique
In tutorials building up on this one, we will show how to use these building blocks in conjunction with training :doxy:`FFNet` in an unsupervised fashion.
As a dataset for this tutorial, we use a subset of the MNIST dataset which needs to
be unzipped first. It can be found in examples/Supervised/data/mnist_subset.zip
.
The following includes are needed for this tutorial:
#include <shark/Data/Pgm.h> //for exporting the learned filters
#include <shark/Data/SparseData.h>//for reading in the images as sparseData/Libsvm format
#include <shark/Models/Autoencoder.h>//normal autoencoder model
#include <shark/Models/TiedAutoencoder.h>//autoencoder with tied weights
#include <shark/ObjectiveFunctions/ErrorFunction.h>
#include <shark/Algorithms/GradientDescent/Rprop.h>// the RProp optimization algorithm
#include <shark/ObjectiveFunctions/Loss/SquaredLoss.h> // squared loss used for regression
#include <shark/ObjectiveFunctions/Regularizer.h> //L2 regulariziation
Training autoencoders¶
Training an autoencoder is straight forward in shark. We just create an instance of the :doxy:`Autoencoder` class, initialize its structure and perform a simple regression task where we set the labels to be the same as the input.
We will start by creating a simple function which creates and trains a given type of autoencoder model. This will enable us during the evaluation to take a look at different structures and model types:
template<class AutoencoderModel>
AutoencoderModel trainAutoencoderModel(
UnlabeledData<RealVector> const& data,//the data to train with
std::size_t numHidden,//number of features in the autoencoder
std::size_t iterations, //number of iterations to optimize
double regularisation//strength of the regularisation
){
The parameters are the dataset we use as the inputs for the
autoencoder, the number of hidden units (i.e. the size of the
intermediate representation), the number of iterations to train and
the strength of the two-norm regularisation we want to use. The
template parameter is the type of autoencoder we want to use, this
will be for example Autoencoder<LogisticNeuron,LogisticNeuron>
.
The two template parameters define the type of the activation function
used in the hidden and output layer, respectively.
Next we create the model:
//create the model
std::size_t inputs = dataDimension(data);
AutoencoderModel model;
model.setStructure(inputs, numHidden);
initRandomUniform(model,-0.1*std::sqrt(1.0/inputs),0.1*std::sqrt(1.0/inputs));
All autoencoders need only two parameters for
setStructure
: the number of input dimensions and the number of hidden units.
The number of outputs is implicitly given by the input dimensionality.
Next, we set up the objective function. This should by now be looking quite familiar. We set up an :doxy:`ErrorFunction` with the model and the squared loss. We create the :doxy:`LabeledData` object from the input data by setting the labels to be the same as the inputs. Finally we add two-norm regularisation by creating an instance of the :doxy:`TwoNormRegularizer` class:
//create the objective function
LabeledData<RealVector,RealVector> trainSet(data,data);//labels identical to inputs
SquaredLoss<RealVector> loss;
ErrorFunction error(trainSet, &model, &loss);
TwoNormRegularizer regularizer(error.numberOfVariables());
error.setRegularizer(regularisation,®ularizer);
Lastly, we optimize the objective using :doxy:`IRpropPlusFull`, a variant of the Rprop algorithm which uses full weight backtracking which is more stable on the complicated error functions formed by neural networks:
IRpropPlusFull optimizer;
optimizer.init(error);
std::cout<<"Optimizing model: "+model.name()<<std::endl;
for(std::size_t i = 0; i != iterations; ++i){
optimizer.step(error);
std::cout<<i<<" "<<optimizer.solution().value<<std::endl;
}
Experimenting with different architectures¶
We want to use the code above to train different architectures. We first start with the standard autoencoder. It has the formula:
This is the normal equation for a feed forward neural network with a single hidden layer. The input and output activation functions \(\sigma_1\) and \(\sigma_2\) can be chosen among the same types as used for :doxy:`FFNet`. The problem of this architecture is that, since the weight matrices \(W_1\) and \(W_2\) are independent, the autoencoder can easily learn the identity given a big enough hidden layer. A way around this is to use a :doxy:`TiedAutoencoder` which has the formula:
Here we set \(W_2=W_1^T\) eliminating a lot of degrees of freedom.
Additionally we can use a relatively new technique called dropout [SrivastavaEtAl2014]. This works completely on the level of the activation functions by setting the neuron randomly to 0 with a probability of 0.5. Dropout makes acts only on the hidden units. It makes it harder for the single representations to specialise, and instead redundant features need to be learned.
We will now use the 4 combinations of using tied weights and dropout and compare the features that were generated on the MNIST dataset (we omit loading and preprocessing of the dataset for brevity):
typedef Autoencoder<LogisticNeuron, LogisticNeuron> Autoencoder1;
typedef TiedAutoencoder<LogisticNeuron, LogisticNeuron> Autoencoder2;
typedef Autoencoder<DropoutNeuron<LogisticNeuron>, LogisticNeuron> Autoencoder3;
typedef TiedAutoencoder<DropoutNeuron<LogisticNeuron>, LogisticNeuron> Autoencoder4;
Autoencoder1 net1 = trainAutoencoderModel<Autoencoder1>(train.inputs(),numHidden,iterations,regularisation);
Autoencoder2 net2 = trainAutoencoderModel<Autoencoder2>(train.inputs(),numHidden,iterations,regularisation);
Autoencoder3 net3 = trainAutoencoderModel<Autoencoder3>(train.inputs(),numHidden,iterations,regularisation);
Autoencoder4 net4 = trainAutoencoderModel<Autoencoder4>(train.inputs(),numHidden,iterations,regularisation);
exportFiltersToPGMGrid("features1",net1.encoderMatrix(),28,28);
exportFiltersToPGMGrid("features2",net2.encoderMatrix(),28,28);
exportFiltersToPGMGrid("features3",net3.encoderMatrix(),28,28);
exportFiltersToPGMGrid("features4",net4.encoderMatrix(),28,28);
Visualizing the autoencoder¶
After training the different architectures, we printed the feature maps (i.e., the input weights of the hidden neurons ordered according to the pixels they are connected to). Let’s have a look.
Normal autoencoder:

Autoencoder with tied weights:

Autoencoder with dropout:

Autoencoder with dropout and tied weights.

Full example program¶
The full example program is :doxy:`AutoEncoderTutorial.cpp`.
Attention
The settings of the parameters of the program will reproduce the filters. However the program takes some time to run! This might be too long for weaker machines.
References¶
[SrivastavaEtAl2014] | N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15: 929-1958, 2014 |