Encoding concepts, categories and classes for neural networks

阿新 • • 發佈：2018-12-29

Encoding concepts, categories and classes for neural networks

In a previous post, we explained how neural networks work to predict a continuous value (like house price) from several features. One of the questions we got is how neural networks can encode concepts, categories or classes. For instance, how can neural networks convert a number of pixels to a true/false answer whether or not the underlying picture contains a cat?

First, here are some observations:

A binary classification problem is a problem ‘with a “Yes/No”’ answer. Some examples include: Does this picture contain a cat? Is this e-mail spam? Is this application a virus? Is this a question?
A multi-classification problem is a problem with several categories as an answer, like: what type of vehicle is this (car/bus/truck/motorcycle)?

Any multi-classification problem can be converted to a series of binary classifications. (like: Is this a car, yes or no? Is this a bus, yes or no? etc)

The core idea of classification in neural networks is to convert concepts, categories and classes into probabilities of belonging to these concepts, categories or classes.

Meaning that a cat is 100% cat. A dog is 100% dog. A car is 100% car, etc. Each independent concept by itself is a dimension in the conceptual space. So for instance, we can say:

A cat is: 100% “cat”, 0% “dog”, 0% “bus”, 0% “car”. [1; 0; 0; 0]
A car is: 0% “cat”, 0% “dog”, 0% “bus”, 100% “car”. [0; 0; 0; 1 ]
A yes is 100% “yes”, 0% “no” -> [1; 0]

The vectorized representation of a category is then a 1 in the dimension representing this category and 0 in the rest. This schema is called 1-hot encoding.

With this idea in mid, we now have a new way to encode categories into classical multi-dimensional vectors. However these vectors have some special properties since they need to represent probabilities or confidence level of belonging to these categories:

The vector length defines the number of categories the neural network can support.
Each dimension should be bounded between 0 and 1.
The sum over the vector should be always 1.
The selected category is the one with the highest value/confidence level (arg max).

So, for instance, if we get as output [0.6, 0.1, 0.2, 0.1] over [cat, dog, bus, car] we can say that the neural network classified this image as a cat with a confidence level of 60% .

Softmax layer

While designing neural networks for classification problems, in order to maintain the properties of the probability vectors at the last layer, there is a special activation called “softmax”.

Softmax converts any vector into probabilities vector (that sums to 1).

What the softmax does is that it takes any vector of numbers, exponentiates everything, then divides every element by the sum of the exponential.

[2 ; 1; 0.1] -> exponential [7.3; 2.7; 1.1] -> sum is 11.1 -> final vector is [0.66; 0.24; 0.1] (which is a probability-vector).

We can easily verify the following properties:

Every component is between 0 and 1 since an exponential can’t be negative, and a division over the sum can’t be over 1.
The sum of output always be equal to 1.
The order is maintained: a higher initial score will lead to a higher probability.
What matters is the relative score to each other. For instance if we have these 2 vectors: [10;10;10] or [200;200;200], they will both be converted to the same [1/3; 1/3; 1/3] probability.

The final step is to provide the loss function that triggers the backpropagation of the learning. Since the outputs are 1-hot encoded vectors, as presented earlier, the most suitable loss function is the log-loss function:

Image result for log loss multiclass — The log-loss function definition

It is derived from information theory and Shannon entropy. Since the actual outputs y are either 0 or 1, the loss is accumulating the lack of confidence of the NN over the known categories.

If the NN is very confident -> the probability of the correct class will be close to 1 -> the log will be close to 0 -> no loss -> no back-propagation -> no learning!
If the NN is not very confident -> the probability will be close to 0 -> the log will be close to -infinity, the loss will be close to infinity -> Big loss -> Big opportunity to learn using back-propagation.

Calculating exponentials and logarithmics are computationally expensive. As we can see from the previous 2 parts, the softmax layer is raising the logit scores to exponential in order to get probability vectors, and then the loss function is doing the log to calculate the entropy of the loss.

If we combine these 2 stages in one layer, logarithmics and exponentials kind of cancel out each others, and we can get the same final result with much less computational resources. That’s why in many neural network frameworks and libraries there is a “softmax-log-loss” function, which is much more optimal than having the 2 functions separated.

Encoding concepts, categories and classes for neural networks

Encoding concepts, categories and classes for neural networksIn a previous post, we explained how neural networks work to predict a continuous value (like

CVPR 2017：See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-based Person Re-identification

network 測試 eee 分享 The 因此進行最大變化 [1] Z. Zhou, Y. Huang, W. Wang, L. Wang, T. Tan, Ieee, See the Forest for the Trees: Joint Spatial and

Encoding concepts, categories and classes for neural networks

Encoding concepts, categories and classes for neural networks

Encoding concepts, categories and classes for neural networks

CVPR 2017：See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-based Person Re-identification

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Building Fast and Compact Convolutional Neural Networks for Offline HCCR

學習摘要：Methods for interpreting and understanding deep neural networks

《Fluency Boost Learning and Inference for Neural Grammatical Error Correction》論文總結

《Learning both Weights and Connections for Efficient Neural Networks》論文筆記

機器學習筆記~Practical Advice for Building Deep Neural Networks by Matt H and Daniel R

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Convolutional Neural Networks for Beginners: Practical Guide with Python and Keras

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Cla

Mastering the game of Go with deep neural networks and tree search

Neural Networks and Deep Learning學習筆記ch1 - 神經網絡

DeepEyes: 用於深度神經網絡設計的遞進式可視分析系統 (DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks)

課程一(Neural Networks and Deep Learning)總結：Logistic Regression

Batch Normalization and Binarized Neural Networks

Understanding Convolutional Neural Networks for NLP

Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1

吳恩達機器學習第5周Neural Networks（Cost Function and Backpropagation）

第四節，Neural Networks and Deep Learning 一書小節(上)

Encoding concepts, categories and classes for neural networks

Encoding concepts, categories and classes for neural networks

相關推薦