AI based on a neural network
This blog post shows the construction of a neural network for an artificial intelligence. In addition, it is explained by way of example how this AI can be trained with the help of training data, backpropagation and evolution, but also which dangers can occur in this process.
Neuron
A neuron maps a nerve cell. Since we are talking about artificial intelligence, the neuron must be theorized. The following figure shows an example of how this artificial neuron can be constructed.
Input and weighting
The input of a neuron consists of many values “x” and correspondingly many weights “w”. The values x are direct inputs, for a neuron on the first layer, or the outputs of other neurons, for a neuron on a back layer. The first value is usually defined independently of the inputs and is called bias. This bias ensures that the neuron is not only dependent on variable inputs and provides stability.
Transfer function
In the transfer function the inputs of the neuron are prepared for the activation function. Usually the sum of all inputs multiplied by their weights is calculated and transferred to the activation function.
Activation function
There are many different activation functions. A class of functions frequently used here are the sigmoid functions. These functions are characterized by the fact that they are limited and differentiable on both sides. The differentiability is later very important for the training of the network.
The function in the figure above maps each input to a value between 1 and -1.
Neural network
The following figure shows an exemplary artificial neural network. Each circle designates a single artificial neuron. Here the neurons are arranged in 4 layers, but this can be much more complex. The upper offside neurons serve as bias for the next layer. They have no input, but always deliver the same fixed value. The first layer is called “Input Layer”. This layer contains the data for input to the network. Each data point gets its own neuron for this.
Then follow one or more hidden layers, which are responsible for the deeper recognition of structures. The more complex these layers are, the more complex structures can be identified here. At the end is the output layer, called “Output Layer”. The number of neurons on this layer depends on the possible results that the network should distinguish.
Each neuron on one layer is connected to each neuron on the following layer.
Training of the network
What does it actually mean to train a neural network? At the beginning all weights are set randomly. The goal of the training is to optimize them in such a way that the desired output is available in the output layer for an input, i.e. the error of the net is minimal.
For the training itself, however, complex training data must first be created. Training data is data of which the input and the desired output is known.
Training with the help of backpropagation
Backpropagation means error feedback. Here, all incoming weights of each neuron are optimized from the last to the first layer. Especially here the calculation of the error of the inner layers as well as the input layer turns out to be a problem at first sight. Calculating the error for the output layer, on the other hand, is trivial, since the target values are known from the training data. The weight changes are now calculated using the following example formula.
The weight change of the current neuron (i) to a neuron (j) on the preceding layer is the product of the learning rate, the value of the error of the current neuron from the derivation of the activation function and the activation value of the considered neuron of the preceding layer. The learning rate should be kept very close to zero, but must not be zero, otherwise no changes will occur. For the error of the current neuron a case distinction must be performed. If the neuron is on the output layer, the error is calculated from the difference between the target activation value from the training data and the actual activation value of the neuron. If the neuron is on a hidden layer, the error of the current neuron is calculated as the sum of the product of all errors of the neurons on the next layer with the weight of this neuron to the current one. The sum of the already existing weight and the weight change now forms the new weight.t.
Training via Evolution
Training through evolution is not really about training, because there is not only one net from which the weights are adjusted. Training data creates a survival situation in which the net has to assert itself. Ideally, it is defined when a net has failed in a scenario. In addition, a “fitness” is assigned, which makes the net comparable. The goal is to maximize the fitness.
It starts with a single, randomly generated net that clones itself at certain fixed points. The idea behind this is that this net has already made it to a point and therefore has a certain acceptable basic configuration. The new cloned net is only randomly changed slightly and starts over again with the training data. The variant of the training is often also modified into a turn-based training. It is started with several random nets at the same time and only the best nets of this round are used as a basis for the generation of new nets for the next round.
Structure of the example network
For this network we need six neurons on the input layer. As input we defined all x and y values, where all are zero except the pair for which the switch position is to be determined. Afterwards, two hidden layers with any number of neurons follow. Here we have to try a little bit in practice how many neurons and layers are useful. For six input fields, however, particularly high numbers turn out to be not beneficial. The goal is to get the best results with as few neurons as possible, since this saves a lot of computing power. The output layer consists of three neurons, one for each switch.
Creating the training data
After the net has been constructed, the training data must be created manually. This is the most work, because for each input the target output has to be defined manually. In this example it is relatively manageable, because there is a maximum of 4*2 = 8 different paths. For example, a training data record could contain the input (x1, x2, x3, x4, y1, y2) = (1, 0, 0, 0, 0, 1) and the target output (s1, s2, s3) = (-0.5, 0, -0.5).
Training via backpropagation
It would make sense to use the training about backpropagation instead of evolution.
Use of the network outside the training data
Due to the very simple construction for illustration and the perfection of the training data, it is not possible to use this net for another task with four start positions, two target positions and 3 switches. The training would be obsolete, as the connections can be completely different. A net that recognizes cat photos, for example, can very well be used outside the training data. However, this is because there are almost infinitely many potential training data sets and you train the net with only a fraction of them. It would be transferable to create only six instead of eight training data sets to see if it decides correctly for the other two cases.
Conclusion
Artificial intelligence based on neural networks is able to recognize connections that humans would find difficult to find. However, it must be noted that these networks must not be overtrained in order to prevent the network from learning by h
Comments
No Comments