Scalable Rule-Based Representation Learning for Interpretable Classification

阿新 • • 發佈：2021-11-17

概
主要內容

Wang Z., Zhang W., Liu N. and Wang J. Scalable rule-based representation learning for interpretable classification. In Advances in Neural Information Processing Systems (NIPS), 2021.

概

傳統的諸如決策樹之類的機器學習方法具有很強的結構性, 也因此具有很好的可解釋性. 和深度學習方法相比, 這類方法比較難以推廣到大規模的問題上, 很重要的一個原因便是, 其離散的引數和結構導致無法利用梯度進行優化. 本文是對利用梯度來優化這些模型的一個嘗試.

主要內容

本文考慮的是上圖(a)中的離散模型, 其接受連續變數\(C_i\)和離散變數\(B_i\):

通過Binarization Layer 將連續變數\(C_i\)離散化並與\(B_i\)拼接得到輸入\(\bm{u}^{(0)}\);
對於Logical Layer, 其以\(\bm{u}^{l-1}\)為輸入, 輸出\(\bm{u}^l\), 其包含且\(\bm{r}\)和或\(\bm{s}\)兩個部分:

\[r_i^{(l)} = \bigwedge_{W_{ij}^{(l, 0)} = 1} u_j^{(l-1)}, \\ s_i^{(l)} = \bigvee_{W_{ij}^{(l, 1)} = 1} u_j^{(l-1)}. \\ \]

其中\(W^{(l, 0)}\)

表示\(\bm{r}\)與\(\bm{u}\)的鄰接矩陣, 而\(W^{(l, 1)}\)表示\(\bm{s}\)與\(\bm{u}\)的鄰接矩陣. 可以發現, Logical Layer中的輸入輸出和權重都是二元的.
3. 最後通過一個線性層進行分類, 需要說明的是, 線性層的權重是連續的.

顯然由於logical layer是離散的, 直接通過梯度更新是辦不到的. 一個自然的想法是用一個連續的版本\(\hat{\mathcal{F}}(X; \theta)\)進行替換, 更新連續的引數\(\theta\)然後獲得下列的離散的版本:

\[\mathcal{F}(X; q(\theta)), \quad q(x) = \mathbb{I}_{x > 0.5}. \]

顯然直接套用這個方法是低效的, 因為訓練過程和離散沒有任何關係, 我們沒法保證離散後的模型依舊是有效的, 此外還有一個問題, 上述離散模型如何匹配到一個連續的版本.

下面是一個有趣的解決方案, 假設\(\hat{W}_{i,j} \in [0, 1]\), 則

\[Conj (\bm{u}, W_i) = \prod_{j=1}^n \bigg\{1 - W_{i,j}(1 - u_j) \bigg\}, \\ Disj (\bm{u}, W_i) = 1 - \prod_{j=1}^n \bigg\{1 - W_{i,j}u_j \bigg\}, \\ \]

便為且和或操作的連續版本.
試想:

\[\begin{array}{ll} & r_i = 1 \\ \Leftrightarrow & \bigwedge_j [u_j^{(l-1)} \vee (1 - W_{ij})] = 1\\ \Leftrightarrow & \prod_j \bigg\{1 - W_{i,j}(1 - u_j) \bigg\} = 1.\\ \end{array} \]

其它情況可以類似推導, 實在是有趣.

但是上述式子在實際中會有一些梯度消失的問題(因為連乘號, 且內部是[0, 1]之間的), 所示在實際使用中, 作者加了一個投影運算元

\[Conj_+ = \mathbb{P}(Conj (\bm{u}, W_i)), \]

其中(這設計都是為了避免梯度消失, 怎麼想到的? 怎麼會往這個方向去想的?)

\[\mathbb{P}(v) = \frac{-1}{-1 + \log (v)}. \]

解決了連續版本的問題, 現在剩下的難啃的地方是如何更新\(\theta\)以保證\(q(\theta)\)也是有意義的.
作者採用如下的梯度更新公式:

\[\theta^{t+1} = \theta^t - \eta \frac{\partial \mathcal{L}(\bar{Y})}{\partial \bar{Y}} \cdot \frac{\partial \hat{Y}}{\partial \theta^t}, \]

其中\(\hat{Y} = \hat{\mathcal{F}}(X; \theta)\), \(\bar{Y} = \mathcal{F}(X; \bar{\theta})\).
作者用了一個嫁接的例子來說明該思想, 即損失關於預測的導數用離散的, 內部的導數用連續的.

我驚訝的是, 這些改動居然work? 太不可思議了.

Scalable Rule-Based Representation Learning for Interpretable Classification

目錄概主要內容 Wang Z., Zhang W., Liu N. and Wang J. Scalable rule-based representation learning for interpretable classification. In Advances in Neural Information Processing Systems (NIPS), 2021.

論文解讀（CSSL）《Contrastive Self-supervised Learning for Graph Classification》

論文資訊論文標題：Contrastive Self-supervised Learning for Graph Classification論文作者：Jiaqi Zeng, Pengtao Xie論文來源：2020, AAAI論文地址：download 論文程式碼：download

HRNetv1: Deep High-Resolution Representation Learning for Human Pose Estimation [1902.09212v1] - 論文研讀系列(5) 個人筆記

HRNet:Deep High-Resolution Representation Learning for Human Pose Estimation[1902.09212v1] 論文題目：Deep High-Resolution Representation Learning for Human Pose Estimation

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning 動機圖表示學習最近引起了很多關注。由於有限的計算和記憶體成本，現有的以完整圖資料為基礎的圖神經網路不可擴充套件。因此，在

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation 論文筆記

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation PointNet：三維分類與分割中點集的深度學習論文地址：https://arxiv.org/pdf/1612.00593.pdf 程式碼地址：https://github

無監督學習 MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

用於視覺表示學習的動量對比。作者：Kaiming He 以及FAIR的一眾大佬 Summary 這篇文章主要解決的是無監督視覺表示學習問題。作者從將對比學習看做字典查詢（dictionary look-up）出發，使用佇列（queue）和

論文閱讀筆記《Deep Active Learning for Civil Infrastructure Defect Detection and Classification》

小樣本學習&元學習經典論文整理||持續更新核心思想本文提出一種基於主動學習的民用設施缺陷檢測方法，其思路主要是考慮到在樣本較少的情況下，訓練得到的網路可能不能很好的對各種型別的缺陷都進

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks Abstract 我們為卷積神經網路引入了一個基於顯著性的扭曲（distortion）層，這有助於改善給定任務的輸入資料的空間取樣。我們

論文解讀《Momentum Contrast for Unsupervised Visual Representation Learning》俗稱 MoCo

　　論文題目：《Momentum Contrast for Unsupervised Visual Representation Learning》　　論文作者： Kaiming He、Haoqi Fan、 Yuxin Wu、 Saining Xie、 Ross Girshick　　論文來源：arXiv

Machine-Learning–Based Column Selection for Column Generation

論文閱讀筆記，個人理解，如有錯誤請指正，感激不盡！僅是對文章進行梳理，細節請閱讀參考文獻。該文分類到Machine learning alongside optimization algorithms。

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

發表時間：2020（ICML 2020）文章要點：這篇文章想說model based方法在data efficiency和planning方面都具有天然優勢，但是model的泛化性通常是個問題。這篇文章提出學一個context相關的latent vector，然後用mod

[論文理解]An artificial intelligence-based deep learning algorithm for the diagnosis of diabetic neuropathy using corneal confocal microscopy: a development and validation study

基於人工智慧的角膜共焦顯微鏡診斷糖尿病神經病變的深度學習演算法：開發和驗證研究，2019

Scalable Rule-Based Representation Learning for Interpretable Classification

概

主要內容

Scalable Rule-Based Representation Learning for Interpretable Classification

論文解讀（CSSL）《Contrastive Self-supervised Learning for Graph Classification》

HRNetv1: Deep High-Resolution Representation Learning for Human Pose Estimation [1902.09212v1] - 論文研讀系列(5) 個人筆記

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation 論文筆記

無監督學習 MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

論文閱讀筆記《Deep Active Learning for Civil Infrastructure Defect Detection and Classification》

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

論文解讀《Momentum Contrast for Unsupervised Visual Representation Learning》俗稱 MoCo

Machine-Learning–Based Column Selection for Column Generation

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

[論文理解]An artificial intelligence-based deep learning algorithm for the diagnosis of diabetic neuropathy using corneal confocal microscopy: a development and validation study

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

筆記：Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification

Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation閱讀筆記

筆記：Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

變老 - 6 - A Style-Based Generator Architecture for Generative Adversarial Networks（StyleGAN）- 論文學習

文獻學習-Conflicting rate based branching heuristic for CDCL SAT solvers

【論文筆記】A Survey on Deep Learning for Named Entity Recognition

Federated Learning for Vision-and-Language Grounding Problems

Scalable Rule-Based Representation Learning for Interpretable Classification

概

主要內容

相關推薦