Training Feed-Forward Networks¶
This tutorial serves as a primer for using the Shark implementation of feed-forward multi-layer perceptron neural networks [Bishop1995]. The whole functionality can be discovered in the documentation pages of the :doxy:`FFNet` class. It is recommended to read the getting started section, especially the introduction about General Optimization Tasks.
For this tutorial the following includes are needed:
#include<shark/Models/FFNet.h> //the feed forward neural network
#include<shark/Algorithms/GradientDescent/Rprop.h> //resilient propagation as optimizer
#include<shark/ObjectiveFunctions/Loss/CrossEntropy.h> // loss during training
#include<shark/ObjectiveFunctions/ErrorFunction.h> //error function to connect data model and loss
#include<shark/ObjectiveFunctions/Loss/ZeroOneLoss.h> //loss for test performance
//evaluating probabilities
#include<shark/Models/Softmax.h> //transforms model output into probabilities
#include<shark/Models/ConcatenatedModel.h> //provides operator >> for concatenating models
Defining the Learning Problem¶
In this tutorial, we want to solve the infamous xor problem using a feed-forward network. First, we define the problem by generating the training data. We consider two binary inputs that are to be mapped to one if they have the same value and to zero otherwise. Input patterns and corresponding target patterns are stored in a container for labeled data after generation.
For this part, we need to include the Dataset and define the problem in a function, which we will use shortly:
LabeledData<RealVector,unsigned int> xorProblem(){
//the 2D xor Problem has 4 patterns, (0,0), (0,1), (1,0), (1,1)
vector<RealVector> inputs(4,RealVector(2));
//the result is 1 if both inputs have a different value, and 0 otherwise
vector<unsigned int> labels(4);
unsigned k = 0;
for(unsigned i=0; i < 2; i++){
for(unsigned j=0; j < 2; j++){
inputs[k](0) = i;
inputs[k](1) = j;
labels[k] = (i+j) % 2;
k++;
}
}
LabeledData<RealVector,unsigned int> dataset = createLabeledDataFromRange(inputs,labels);
return dataset;
}
Defining the Network topology¶
After we have defined our Problem, we can define our feed forward network. For this, we have to decide on the network topology. We have to choose activation functions, how many hidden layers and neurons we want, and how the layers are connected. This is quite a lot of stuff, but Shark makes this task straight-forward.
The easiest part are the neurons. Shark offers several different types of neurons named after their activation function [ReedMarks1998]:
- :doxy:`LogisticNeuron`: is a sigmoid (S-shaped) function with outputs in the range [0,1] and the following definition
- :doxy:`TanhNeuron`: the hyperbolic tangens, can be viewed as a rescaled version of the Logistic function with outputs ranging from [-1,1]. It has the formula
- :doxy:`FastSigmoidNeuron`: a sigmoidal function which is faster to evaluate than the previous two activation functions. It has also “bigger tails” (i.e., the gradient does not vanish as quickly). This activation function is highly recommended and defined in Shark as
- :doxy:`RectifierNeuron`: an activation function that has become popular more recently [KrizhevskyEtAl2012]. The neuron’s activation is kept at 0 for negative activation levels and is linear for all positive values. A network with these neurons is effectively a piecewise linear function.
- :doxy:`LinearNeuron`: not a good choice for hidden neurons, but for output neurons when the output is not bounded. This activation function \(f(x)=x\) is the typical choice for regression tasks.
For our example, we will use logistic hidden neurons and a linear output neuron. We choose the neuron types using two template parameters, one for the hidden neurons, one for the visible. For the topology, we will choose a network with 2 hidden neurons without direct connections between input and output neuron(s). We also want a bias neuron (i.e., bias or offset parameters). All this can be achieved with :doxy:`FFNet::setStructure`:
unsigned numInput=2;
unsigned numHidden=4;
unsigned numOutput=1;
FFNet<LogisticNeuron,LinearNeuron> network;
network.setStructure(numInput,numHidden,numOutput,FFNetStructures::Normal,true);
The last two parameters are optional and here they are set to their default values and could have been omitted.
Training the Network¶
After we have defined problem and topology, we can now finally train the network. The most frequently used error function for training neural networks is arguably the :doxy:`SquaredLoss`, but Shark offers alternatives. Since the xor Problem is a classification task, we can use the :doxy:`CrossEntropy` error to maximize the class probability [Bishop1995]. The cross entropy assumes the inputs to be the log of the unnormalized probability \(p(y=c|x)\), i.e. the probability of the input to belong to class \(c\). The cross entropy uses an exponential normalisation to transform the inputs into proper normalised probabilities, however this is done in a numerically stable way.
The c-th output neuron of the network encodes in this case the probability of class c. In case of a binary problem, we can omit one output neuron and in this case, it is assumed that the output of the imaginary second neuron is just the negative of the first. The loss function takes care of the normalisation. After training, the most likely class label of an input can be evaluated by picking the class of the neuron with highest activation value. In the case of only one output neuron, the sign decides: negative activation is class 0, positive is class 1.
For optimizing this function the improved resilient backpropagation algorithm ([IgelHüsken2003], a faster, more robust variant of the seminal Rprop algorithm [Riedmiller1994]) is used:
//get problem data
LabeledData<RealVector,unsigned int> dataset = xorProblem();
//create error function
CrossEntropy loss; // surrogate loss for training
ErrorFunction error(dataset,&network,&loss);
//initialize Rprop and initialize the network randomly
initRandomUniform(network,-0.1,0.1);
IRpropPlus optimizer;
optimizer.init(error);
unsigned numberOfSteps = 1000;
for(unsigned step = 0; step != numberOfSteps; ++step)
optimizer.step(error);
If you don’t know how to use and evaluate the trained model you will find the information in the getting started section.
Calculate the probabilities¶
As outlined earlier, the network does not return the actual probabilities after training However, sometimes we are interested int he probabilities in which case we need to convert the network output. For this purpose, we can use the :doxy:`Softmax` model. It takes the input and applies just the right transformation for the probabilities. Additionally, it handles the case of only a single output just as well as if we had trained the model with two output neurons. What we need to do is concatenate our ffnet with it and print out the probabilities:
cout<<"probabilities:"<<std::endl;
Softmax probabilty(1);
for(std::size_t i = 0; i != 4; ++i){
cout<< (network>>probabilty)(dataset.element(i).input)<<std::endl;
}
Other network types¶
Shark offers many different types of neural other neural networks, including radial basis function networks using :doxy:`RBFLayer` and recurrent neural networks (:doxy:`RNNet`) as well as support vector and regularization networks.
Full example program¶
The full example program is :doxy:`FFNNBasicTutorial.cpp`. Multi class classification with cross entropy is shown in :doxy:`FFNNMultiClassCrossEntropy.cpp`.
References¶
[Bishop1995] | (1, 2) C.M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995. |
[IgelHüsken2003] | C. Igel and M. Hüsken. Empirical Evaluation of the Improved Rprop Learning Algorithm. Neurocomputing 50(C), pp. 105-123, 2003 |
[KrizhevskyEtAl2012] | A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In: NIPS 2012, pp. 1097-1105, 2012 |
[ReedMarks1998] | R.D. Redd and R.J. Marks. Neural smithing: supervised learning in feedforward artificial neural networks. MIT Press, 1998 |
[Riedmiller1994] | M. Riedmiller. Advanced supervised learning in multilayer perceptrons-from backpropagation to adaptive learning techniques. International Journal of Computer Standards and Interfaces 16(3), pp. 265-278, 1994. |