AI based on a neural network

This blog post shows the construction of a neural network for an artificial intelligence. In addition, it is explained by way of example how this AI can be trained with the help of training data, backpropagation and evolution, but also which dangers can occur in this process.

Neuron

A neuron maps a nerve cell. Since we are talking about artificial intelligence, the neuron must be theorized. The following figure shows an example of how this artificial neuron can be constructed.

Input and weighting

The input of a neuron consists of many values “x” and correspondingly many weights “w”. The values x are direct inputs, for a neuron on the first layer, or the outputs of other neurons, for a neuron on a back layer. The first value is usually defined independently of the inputs and is called bias. This bias ensures that the neuron is not only dependent on variable inputs and provides stability.

Transfer function

In the transfer function the inputs of the neuron are prepared for the activation function. Usually the sum of all inputs multiplied by their weights is calculated and transferred to the activation function.

Activation function

There are many different activation functions. A class of functions frequently used here are the sigmoid functions. These functions are characterized by the fact that they are limited and differentiable on both sides. The differentiability is later very important for the training of the network.
The function in the figure above maps each input to a value between 1 and -1.

Neural network

The following figure shows an exemplary artificial neural network. Each circle designates a single artificial neuron. Here the neurons are arranged in 4 layers, but this can be much more complex. The upper offside neurons serve as bias for the next layer. They have no input, but always deliver the same fixed value. The first layer is called “Input Layer”. This layer contains the data for input to the network. Each data point gets its own neuron for this.

Then follow one or more hidden layers, which are responsible for the deeper recognition of structures. The more complex these layers are, the more complex structures can be identified here. At the end is the output layer, called “Output Layer”. The number of neurons on this layer depends on the possible results that the network should distinguish.
Each neuron on one layer is connected to each neuron on the following layer.

Training of the network

What does it actually mean to train a neural network? At the beginning all weights are set randomly. The goal of the training is to optimize them in such a way that the desired output is available in the output layer for an input, i.e. the error of the net is minimal.
For the training itself, however, complex training data must first be created. Training data is data of which the input and the desired output is known.

Training with the help of backpropagation

Backpropagation means error feedback. Here, all incoming weights of each neuron are optimized from the last to the first layer. Especially here the calculation of the error of the inner layers as well as the input layer turns out to be a problem at first sight. Calculating the error for the output layer, on the other hand, is trivial, since the target values are known from the training data. The weight changes are now calculated using the following example formula.

The weight change of the current neuron (i) to a neuron (j) on the preceding layer is the product of the learning rate, the value of the error of the current neuron from the derivation of the activation function and the activation value of the considered neuron of the preceding layer. The learning rate should be kept very close to zero, but must not be zero, otherwise no changes will occur. For the error of the current neuron a case distinction must be performed. If the neuron is on the output layer, the error is calculated from the difference between the target activation value from the training data and the actual activation value of the neuron. If the neuron is on a hidden layer, the error of the current neuron is calculated as the sum of the product of all errors of the neurons on the next layer with the weight of this neuron to the current one. The sum of the already existing weight and the weight change now forms the new weight.t.

Training via Evolution

Training through evolution is not really about training, because there is not only one net from which the weights are adjusted. Training data creates a survival situation in which the net has to assert itself. Ideally, it is defined when a net has failed in a scenario. In addition, a “fitness” is assigned, which makes the net comparable. The goal is to maximize the fitness.
It starts with a single, randomly generated net that clones itself at certain fixed points. The idea behind this is that this net has already made it to a point and therefore has a certain acceptable basic configuration. The new cloned net is only randomly changed slightly and starts over again with the training data. The variant of the training is often also modified into a turn-based training. It is started with several random nets at the same time and only the best nets of this round are used as a basis for the generation of new nets for the next round.

Simple practical example

Structure of the example network

For this network we need six neurons on the input layer. As input we defined all x and y values, where all are zero except the pair for which the switch position is to be determined. Afterwards, two hidden layers with any number of neurons follow. Here we have to try a little bit in practice how many neurons and layers are useful. For six input fields, however, particularly high numbers turn out to be not beneficial. The goal is to get the best results with as few neurons as possible, since this saves a lot of computing power. The output layer consists of three neurons, one for each switch.

Creating the training data

After the net has been constructed, the training data must be created manually. This is the most work, because for each input the target output has to be defined manually. In this example it is relatively manageable, because there is a maximum of 4*2 = 8 different paths. For example, a training data record could contain the input (x1, x2, x3, x4, y1, y2) = (1, 0, 0, 0, 0, 1) and the target output (s1, s2, s3) = (-0.5, 0, -0.5).

Training via backpropagation

It would make sense to use the training about backpropagation instead of evolution.

Use of the network outside the training data

Due to the very simple construction for illustration and the perfection of the training data, it is not possible to use this net for another task with four start positions, two target positions and 3 switches. The training would be obsolete, as the connections can be completely different. A net that recognizes cat photos, for example, can very well be used outside the training data. However, this is because there are almost infinitely many potential training data sets and you train the net with only a fraction of them. It would be transferable to create only six instead of eight training data sets to see if it decides correctly for the other two cases.

About the Author

Niklas Petersen

Niklas is always motivated to explore new areas and technologies. He feels most comfortable in a team with a great sense of humour.

Comments

No Comments

Comments closed

Approach and findings of an architecture analysis within the framework of a code review

Author

Benny Schwarting

Published

15.07.2019

After the last article looked at important key figures from the static analysis of a code review, aspects of manual analysis are now highlighted. How does an architecture analysis work in a code review and what conclusions can be drawn from it?

ITIL- und ISO27001- Further training in times of Corona

Author

Janina Beckert

Published

27.04.2021

Before the pandemic, people were used to travel to the training location to participate. Now, however, during the pandemic, things are a little different. Many companies now offer their training online.

Static code analysis – consideration of technical debt and complexity

Author

Benny Schwarting

Published

08.03.2019

This article describes which key figures can be collected by a static analysis and how these can be interpreted. The main focus is on technical debt and complexity.

Mutation Testing with Pitest – Part 2: SonarQube

Author

Philipp Czora

Published

29.11.2017

This post follows on from the previous part. If you have not yet read it, we recommend you take a few minutes to do so now.

Mutation Testing with Pitest

Author

Philipp Czora

Published

28.08.2017

Unit tests can be useful for ensuring code quality and correctness. Not every unit test makes sense, however, and bugs often manage to escape detection by unit tests. How can test quality be increased so that programming errors are detected earlier and more reliably?

Naming conventions in Oracle databases

Author

Jan Niemann

Published

31.07.2017

Naming conventions are an important part of coding conventions. Naming conventions are the rules for how “things” are named. In the case of a database, the “things” being named are schema objects such as tables, sequences, indexes, views, triggers as well as constraints. In a database it is essential to choose names with particular care.

AI based on a neural network

Neuron

Input and weighting

Transfer function

Activation function

Neural network

Training of the network

Training with the help of backpropagation

Training via Evolution

Simple practical example

Structure of the example network

Creating the training data

Training via backpropagation

Use of the network outside the training data

Conclusion

Comments

Related Posts