The Intuition Behind Artificial Neural Networks

阿新 • • 發佈：2018-12-28

The Intuition Behind Artificial Neural Networks

Explaining ANNs by analogy to the human brain

The Neuron

Brains are the best weapons of learning and they are built from neural networks. This is a network or neutrons, so how do neutrons actually work?

Body, dendrites (receiver) and axon (transmitter). Messages are sent which stimulate the receiver’s dendrites. This reception area is called the synapse.

So how do we create a neutron in a machine? Consider a neutron as a node with several input values and one output signal. The input values may also be neutrons (hidden layers). These input values constitute the input layer (if it’s composed of other neutrons, it’s a hidden input layer).

The input layer is a bit like a collection of senses. Note that the brain is encased in a hard bone skull, a black box, and relies on these senses to collect information about the surroundings.

The connections between the input values and the node are equivalent to synapses.

Input layer:

The independent variables are all from a single observation and should be standardised (mean 0, s.v. 1) or normalised (in the range 0–1) so that the input nodes have similar value ranges (helps efficiency).

Output values:

Continuous (price)
Binary (yes/no)
Categorical variable (actually sever probabilities — one for each category)

Whichever one type of output is produced, it relates to a single observation’s input.

Each synapse gets assigned a weight which is crucial to the NN. These are how the NN learns: by adjusting the weights using tools like gradient descent and back propagation.

So What Happens Inside the Neuron?

Firstly, all of the input values get added together, according to their weights (simply multiply by the weight and sum). The activation function is applied to this weighted sum to determine the output value.

The Activation FunctionOptions:

Threshold function: This is 0 or 1 depending if the weighted sum is bigger than the threshold
Sidmoid function: This is smooth and continuous like a smooth threshold function. Values in the range (0,1) so better for predicting probability
Rectifier function: This is 0 when the sum < 0 and uniformly increasing thereafter. Softplus is an alternative which is smooth.
Hyperbolic tangent: Similar to the sigmoid function but in the range (-1,1)

Of course, a neural network is made up of a whole network of nodes arranged in (hidden) layers, and each node contains its own activation function.

How Do Neural Networks Work

Let’s take an example where we want to predict property prices, and let’s assume that we have a NN that is already trained up using the following input parameters: — Area — Bedrooms — Distance to the city — Age These parameters comprise the input layer. The output layer is the predicted price. In a simple case, these inputs would be weighted and summed to give a single output value.

This is a very trivial representation of the problem and you can see it may be very inflexible, as well as sensitive to the input data.

This is where the hidden layers in an ANN become very important. Neurons in the hidden layer can react to different features of the input data and ignore others, simply by adjusting the weights.

Given a case with exactly one hidden layer, we can connect each input value in the input layer to each node of the hidden layer and weight their connections. Now, some weights may be zero because they may not be important to the neuron that they are connected to.

In the example, area and distance to the city may be correlated, but the number of bedrooms may not be correlated to these. The neuron then has the weight for number of bedrooms set to 0. Similarly, area, number of bedrooms and age may be correlated but distance may not be correlated with these, affecting the weights to the next neuron.

Try to work out a possible rationale for neuron H5.

Individually, each neuron cannot predict the price output, but in combination they become very powerful.

How Do Neural Networks Learn

Now, heres the important bit…

When programming the solution to a classification problem (for example) there are two options:

hard-coded — this is based on rules and all actions and features are taken care of explicitly and programmed accordingly;
alternatively, as in ANNs, provide a means to figure out how to convert inputs to outputs.

The goal of machine learning is to create a method (in the case of ANNs, this is a network) that learns on its own without deterministic rules. For a Neural Network, the idea is that it will learn and adjust the weights by itself.

Notation: y^ is the value predicted by the ANN, and y is the actual value.

Then, for each input data-point, each value gets supplied to the perceptron and the activation function is applied. This produces the output value y^.

The next step is to compare the actual and output values and then calculate the value of the cost function. One of the most common cost functions is to calculate half of the squared distance between the two values, as shown above. The cost function shows the error in the prediction so we want to minimise this.

The result of the cost function is fed backwards through the perceptron and the weights are updated. The method for this is discussed later in this article.

So far we have only worked on one single data point with several features. If our dataset is made up of only a single point, we keep iteratively feeding in this datapoint and updating the weights until the cost function is below some threshold.

Now, supposing we have a whole dataset, we go through the whole dataset on each iteration, instead of just one data point. This is termed an ‘epoch’. After we obtain the output value for each input point, we compare to the actual values. The cost function for the whole dataset is the sum of the cost functions for each data point. We then update the weights and run further epochs until the cost function is minimised.

This whole process is called back-propagation. (See also cross-validated back-propagation.)

Gradient Descent

So far: in order for an Artificial Neural Network to learn, back-propagation of the error through the network must occur and the weights must be adjusted. So how do we adjust the weights?

One way could be to try out a large range of weights and plot the error accordingly. This is simple and intuitive if there is only one weight needing to be optimised. However, there are as many dimensions as there are input values in the input layers. A brute force approach would require testing k^m weights (where m is the number of inputs and k is the number of weight options to test for each). This then becomes a very complex task which could take a very long time.

Gradient descent is a solution that means we don’t have to test every possible combination. The general idea is to:

Pick an initial point
Test the gradient at this point (perhaps by testing a neighbour)
Pick the next point in the direction of that downward gradient
Continue until the gradient is within some threshold of zero-gradient

Of course, there may be complications as well as means to optimise the method. Furthermore, the method works in multiple dimensions.

Stochastic Gradient Descent

Gradient descent is a useful method, but requires a convex cost function (this is simple, with a single global minimum). If the cost function is not convex, then several minima may occur and the algorithm can get ‘stuck’ in one local minimum.

In GD, we look at all of the rows in the input data together. However, for stochastic GD, weight adjustment happens after each input, rather than after all of them. For both, there are multiple epochs. Stochastic GD helps to stop the minimisation from getting trapped into one local minimum. It causes the weights to fluctuate a lot more, but the result is that it has a greater chance of finding the global minimum than a local minima. It is also faster, as it requires less data to be used for each calculation.

See also the mini-batch gradient descent method.

Summary

Randomly initialise the weights to small numbers, close to 0.
Input the first observation in the input layer, where each feature in one input corresponds to one input node.
Forward propagation: from left-to right in our diagrams, the neutrons are activated such that the impact of each neurones activation is limited by the weights. These activations are propagated until the predicted result is obtained.
Compare the predicted and actual values.
Back Propagation: from right to left in our diagrams, update the weights according to how responsible they are for the calculated error. The learning rate decides by how much we update the the weights.
Repeat 1–5, updating the weights either:
After each observation — Reinforcement Learning
After a batch of observations — Batch Learning
Repeat for several epochs (where one epoch is completed when the whole training set has passed through the ANN).

The Intuition Behind Artificial Neural Networks

The Intuition Behind Artificial Neural NetworksExplaining ANNs by analogy to the human brainThe NeuronBrains are the best weapons of learning and they are

【DATE2017】Double MAC: Doubling the Performance of Convolutional Neural Networks on Modern FPGAs

-1 資源 font 文章討論要點兩個需要分享這篇文章介紹了如何利用FPGA內部單個DSP來實現SIMD乘法，從而提高DSP利用率，緩解計算資源不足的問題，是一個比較實用的trick。要點：利用單個DSP並行實現兩次乘法：A*C、B*C；文中只討論了A、

論文筆記12:Building Adaptive Tutoring Model using Artificial Neural Networks and Reinforcement Learning

論文筆記12：《Building Adaptive Tutoring Model using Artificial Neural Networks and Reinforcement Learning》參考文獻:Building Adaptive Tutoring Model Using Ar

讀書筆記26：adding attentiveness to the neurons in recurrent neural networks

摘要首先介紹RNN可以模擬複雜的序列資訊的temporal dynamics，但是當前的RNN神經元的結構主要是控制當前資訊和歷史資訊的貢獻，但是沒有考慮探索input vector中不同元素的重要性（這個指的是某一個時刻的vector的不同dimension的重要性），

5 Artificial Neural Networks Supporting Machine Learning

The human brain is especially good at solving problems… So good, in fact, that in the 1940s various computer scientists began to build computation models i

Scientists use artificial neural networks to predict new stable materials

"Predicting the stability of materials is a central problem in materials science, physics and chemistry," said senior author Shyue Ping Ong, a nanoenginee

The next phase: Using neural networks to identify gas

This breakthrough work has been recognized as a finalist for a 2018 R&D 100 award. R&D 100 awards, called the "Oscars of Innovation," are given ou

人工神經網路(Artificial Neural Networks)

轉載：http://www.datalab.sinaapp.com/?p=309 人工神經網路的產生一定程度上受生物學的啟發，因為生物的學習系統是由相互連線的神經元相互連線的神經元組成的複雜網路。而人工神經網路跟這個差不多，它是一系列簡單的單元相互密集連線而成的

Mastering the game of Go with deep neural networks and tree search

深度策略參數初始化技術以及 -1 簡單 cpu 網絡 Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.758

CVPR 2017：See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-based Person Re-identification

network 測試 eee 分享 The 因此進行最大變化 [1] Z. Zhou, Y. Huang, W. Wang, L. Wang, T. Tan, Ieee, See the Forest for the Trees: Joint Spatial and

DeepTracker: Visualizing the Training Process of Convolutional Neural Networks（對卷積神經網絡訓練過程的可視化）

training ces ini net mini 個人 src works con \ 裏面主要的兩個算法比較難以贅述，miniset主要就是求最小公共子集。（個人認為）DeepTracker: Visualizing the Train

14.On the Decision Boundary of Deep Neural Networks

關於深度神經網路的決策邊界摘要雖然深度學習模型和技術取得了很大的經驗成功，但我們對許多方面成功來源的理解仍然非常有限。為了縮小差距，我們對訓練資料和模型進行了微弱的假設,產生深度學習架構的決策邊界。我們在理論上和經驗上證明，對於二元情形和具有常用交叉熵的多類情況，神經網路的最後權重層收斂

Mastering the game of Go with deep neural networks and tree search譯文

用深度神經網路和樹搜尋征服圍棋作者：David Silver 1 , Aja Huang 1 , Chris J. Maddison 1 , Arthur Guez 1 , Laurent Sifre 1 , George van den Driessche

Compressing Convolutional Neural Networks in the Frequency Domain 論文筆記

摘要卷積神經網路（CNN）越來越多地用於計算機視覺的許多領域。它們特別有吸引力，因為它們能夠通過數百萬個引數“吸收”大量標記資料。然而，隨著模型尺寸的增加，分類器的儲存和儲存器需求也在增加，這阻礙了許多應用，例如行動電話和其他裝置上的影象和語音識別。在本文中，我們提出了一種新穎的網路架構，頻率敏感雜湊網（

Artificial Intelligence, Machine Learning and Neural Networks – Keeping Things in Perspective

It is an overarching computer science discipline that deals with making machines think like humans, having consciousness and the ability to adjust to the c

Artificial intelligence helps reveal how people process abstract thought: Study of deep neural networks suggests knowledge comes

"As we rely more and more on these systems, it is important to know how they work and why," said Cameron Buckner, assistant professor of philosophy and au

The Intuition Behind Artificial Neural Networks

The Intuition Behind Artificial Neural Networks