Artificial Neural networks are currently considered as state of the art method in the AI fields. They are a piece of software and are the building foundation of all modern AI powered systems.
The first neural network was created in 1943 by Warren McCulloch and Walter Pitts. Unfortunately for them the computers were not strong enough back then to be able to build a prototype and test it. Later in 1954 Belmont Farley and Wesley Clark, professors at the Massachusetts Institute of Technology succeeded in running the first simple neural network. Although very primitive and slow it marked the start of the deep learning era.
The name ‘neural network’ comes from the fact that they were inspired by the brain’s neural network. The dominating theory on how the brain functions is that our receptors (eyes, nose, ears, senses etc.) trigger electric impulses that pass through the brain’s neural network and similar things trigger similar electric waves. For example when you look at a picture of a mountain and when you smell the fresh air in the forest you feel in similar fashion.This is caused by the fact that both the mountain sight and the fresh air trigger somewhat similar electric signals in your brain. Other interesting property of the brain is that it can learn. That link between the mountain and the fresh air was created with the years, same way other links are created and together they make nature’s most advanced computation machine.
This pattern recognition ability is very useful in the modern society, where we are bombarded with huge amounts of data, data scientists refer to this event as the “Data Explosion“. It is already obvious that we are unable to handle such amounts of data. This is the problem Artificial Intelligence tries to tackle.
How artificial neural networks (ANNs) work
Imagine ANNs as a black box which has input and output. Similar to the biological neural network the input can be image, sound, text etc. but converted to sequence of numbers. This sequence describes the input in a unique way, distinguishable from the rest of the inputs. Then we have the output, it is again a sequence of numbers which in similar fashion uniquely describes the particular output. At the end we can view this as a function, or mapping from one set of items to another set of items. Each item is described by its unique array of numbers. Mathematically speaking artificial neural networks are just somewhat complex functions.
The internal structure of the network is a graph, or nodes with links between them. Each node is called artificial neuron and has simple structure. It has one or many inbound links and one or many outbound links. The whole network consists of layers of such nodes that are connected in a certain way, usually all nodes from one layer are fully connected to the nodes of the second layer. Additionally the links have weights assigned to them.
The main function of the artificial neuron is to receive data (numbers) from its input links, perform some calculations on them and then pass the data to the next neurons via the outbound links. The internal calculations in the neuron are quite simple. All the inbound links provide number, which is the result of the calculations of the previous neuron, all the numbers form an array. They get summed and passed to the so called activation function. Activation functions are usually defined once for the whole layer. This is not mandatory and you can define the function for each neuron individually. The function itself is single argument function, the argument provided is the sum of the input link sequence of numbers. The name activation function comes from the fact, that this function will determine the number this neuron will pass to the next layer of neurons. In the biological neuron this would be the strength of the signal this neuron will emit to the ones connected to it.
The input is calculated by summing all the outputs from the previous neurons connected to the current one, but each of the outputs is first multiplied by the weight of the link. At the end we get single number passed as parameter to the activation function which calculates the output of the current neuron.
There are many types of activation functions and appropriate one is chosen based on the use case.
Here are some of the more popular activation functions:
Sigmoid: return value monotonically increasing most often from 0 to 1
Rectifier (ReLU): linear function that will output the input directly if is positive, otherwise, it will output zero
Tanh (Hyperbolic): rescaling of the sigmoid, such that its outputs range from -1 to 1
There are plenty of functions, choosing the right one is not an easy task and is usually done by experimenting, there are instances where the activation function is empty (outputs the number it receives) and all the predictions are done entirely by the link weights.
The links and the layers
As we said the input for each neuron is the sum of the numbers calculated by all of its predecessors and the output from each neuron is multiplied by the weight before being passed as input to the next in line. This simple feature allows us to make certain neurons more important for the output than others. Links are also directed, the most used networks Feed Forward Networks have one direction, from left to right. Other architectures like the Recurrent Network have cycles inside them.
We have one input layer and one output layer all layers between them are called hidden layers. When defining a layer we specify its size, connections and activation function and this is applied to all neurons inside. Determining the number of hidden layers is not easy and the optimal count is usually found by experimenting, same goes for the layer sizes. The most popular layer is Dense layer or fully connected layer where each node is connected to every node from the next layer
Training of artificial neural networks
In order to train ANNs we need examples and we need lots of them. Examples are essentially pairs of input and desired output. Remember we need to encode our data into sequence of number that represent each item from the inputs and the outputs in an unique way. These examples are also referred to as training data. Next we use training algorithm called back propagation, where we give the network input from our training data, ask the network to predict the output and compare its prediction to the desired output for the particular example. Based on how far-off the prediction was from the target prediction we update the weights slightly starting from the output layer and moving to the input layer (right to left), thus the name back propagation. We do this for each example in our train data. After all of the examples are passed through the network we say that we did one train iteration, also referred to as epoch.
The process of training, also referred to as learning, is actually the process of updating the weights of the links between the nodes, so that they can recognize the patterns defined by our training data. After the training is completed the result is called model. One artificial neural network can produce different models based on the training data, iterations and other more specific parameters. The model is essentially a state of the ANN with the weights resulted after the training. Some models can’t be trained further, others can resume training depending on the artificial neural network used.
There are also different types of training, the one we described above with the examples is called supervised learning.
In the unsupervised approach we feed the input data but don’t provide desired outputs this approach is very common with NLP tasks. (Natural Language Processing)
There is also a hybrid approach called reinforced learning, in it a penalty-reward system is created and the network is asked to predict inputs and based on its output it receives a penalty or reward and proceed to update its weights to minimize the penalties and maximize the rewards. This is the approach used with car autopilots for example, where stopping gradually at red light yields rewards and hitting a pedestrian yields penalties.
It is also used in video games for implementing the computer opponents.
Types of artificial neural networks
The most popular network. It has input layer, output layer and optional hidden layer. There are no links going from right to left only right to left (forward), thus the name of the architecture
Recurrent Neural Network (RNN)
The Recurrent Neural Network works on the principle of saving the output of a layer and feeding this back to the input to help in predicting the outcome of the layer. Here we have links that go forward and links that go backwards.
This is useful for sequential data, were the previous context of the example might change the desired output. For example the context of words. The word bank has completely different meaning in the phrases “bank of America” and “along the river bank”.
Mostly used in image and video recognition. The image is split into pixels. Each pixel is converted to number based on its color saturation and transparency (RGBA color model) and this is used as input for the network. Typically these networks have more hidden layers as they need to capture lots of features. You can see them in action everywhere, face recognition, google image search or classifying hand written letters and digits
These are basically complex artificial neural networks that consist of other neural networks. They have a collection of different networks working independently and contributing towards the output.
Each neural network has a set of inputs which are unique compared to other networks constructing and performing sub-tasks. These networks do not interact or signal each other in accomplishing the tasks. The advantage of a modular neural network is that it breakdowns a large computational process into smaller components decreasing the complexity.
Other types of ANN exist these are just few of them.
With the increasing amounts of data artificial neural networks will become even more relevant. Their performance and accuracy is tightly coupled with the computational power and the training data, both expected to increase with time. They are crucial part of AI and are responsible for most state of the art models. Models that power many of the modern technologies, which in some instances surpass the human abilities.