1. 程式人生 > 其它 >Machine-Learning–Based Column Selection for Column Generation

Machine-Learning–Based Column Selection for Column Generation

論文閱讀筆記,個人理解,如有錯誤請指正,感激不盡!僅是對文章進行梳理,細節請閱讀參考文獻。該文分類到Machine learning alongside optimization algorithms。

01 Column Generation

列生成 (Column Generation) 演算法在組合優化領域有著非常廣泛的應用,是一種求解大規模問題 (large-scale optimization problems) 的有效演算法。在列生成演算法中,將大規模線性規劃問題分解為主問題 (Master Problem, MP) 和定價子問題 (Pricing Problem, PP)。演算法首先將一個MP給restricted到只帶少量的columns,得到RMP。求解RMP,得到dual solution,並將其傳遞給PP,隨後求解PP得到相應的column將其加到RMP中。RMP和PP不斷迭代求解直到再也找不到檢驗數為負的column,那麼得到的RMP的最優解也是MP的最優解。如下圖所示:

關於列生成的具體原理,之前已經寫過很多詳細的文章了。還不熟悉的小夥伴可以看看以下:

02 Column Selection

在列生成迭代的過程中,有很多技巧可以用來加快演算法收斂的速度。其中一個就是在每次迭代的時候,加入多條檢驗數為負的column,這樣可以減少迭代的次數,從而加快演算法整體的執行時間。特別是求解一次子問題得到多條column和得到一條column相差的時間不大的情況下(例如,最短路中的labeling演算法)。

而每次迭代中選擇哪些column加入?是一個值得研究的地方。因為加入的columns不同,最終收斂的速度也大不一樣。一方面,我們希望加入column以後,目標函式能儘可能降低(最小化問題);另一方面,我們希望加入的column數目越少越好,太多的列會導致RMP求解難度上升。因此,在每次的迭代中,我們構建一個模型,用來選擇一些比較promising的column加入到RMP中:

  • Let \(\ell\) be the CG iteration number
  • \(\Omega_{\ell}\) the set of columns present in the RMP at the start of iteration \(\ell\)
  • \(\mathcal{G}_{\ell}\) the generated columns at this iteration
  • For each column \(p \in \mathcal{G}_{\ell}\), we define a decision variable \(y_p\) that takes value one if column \(p\)
    is selected and zero otherwise

為了最小化所選擇的column,每選擇一條column的時候,都會發生一個足夠小的懲罰\(\epsilon\)。最終,構建column selection的模型 (MILP) 如下:

大家發現沒有,如果沒有\(y_p\)和約束(8)和(9),那麼上面這個模型就直接變成了下一次迭代的RMP了。

假設\(\epsilon\)足夠小,這些約束目的是使得被選中新增到RMP中的column數量最小化,也就是這些\(y_p=1\)的columns。那麼在迭代\(\ell\)中要新增到RMP的的column為:

總體的流程如下圖所示:

03 Graph Neural Networks

在每次迭代中,通過求解MILP,可以知道加入哪些column有助於演算法速度的提高,但是求解MILP也需要一定的時間,最終不一定能達到加速的目的。因此通過機器學習的方法,學習一個MILP的模型,每次給出選中的column,將是一種比較可行的方法。

在此之前,先介紹一下Graph Neural Networks。圖神經網路(GNNs)是通過圖節點之間的資訊傳遞來獲取圖的依賴性的連線模型。與標準神經網路不同,圖神經網路可以以任意深度表示來自其鄰域的資訊。

給定一個網路\(G=(V,E)\),其中\(V\)是頂點集而\(E\)是邊集。每一個節點\(v \in V\)有著特徵向量\(x_v\)。目的是迭代地aggregate(聚合)相鄰節點的資訊以更新節點的狀態,令:

  • \(h^{(k)}_v\) be the representation vector of node \(v \in V\)(注意和\(x_v\)區分開)at iteration \(k=0,1,...,K\)
  • Let \(\mathcal{N}(v)\) be the set of neighbor (adjacent) nodes of \(v \in V\)

如下圖所示,節點\(v_1\)從相鄰節點\(v_2,v_3,v_4\) aggregate資訊來更新自己:

在迭代\(k > 0\)中,一個aggregated function,記為\(aggr\),在每個節點\(v \in V\)上首先用於計算一個aggregated information vector \(a^{(k)}_v\):

其中在初始時有\(h^{0}_v=x_v\)\(\phi^{(k)}\)是一個學習函式。\(aggr\)應該和節點順序無關,例如sum, mean, and min/max functions。

接著,使用另一個函式,記為\(comb\),將aggregated information與節點當前的狀態進行結合,更新後的節點representation vectors:

其中\(\psi^{(k)}\)是另一個學習函式。在不斷的迭代中,每一個節點都收集來自更遠鄰居節點的資訊,在最後的迭代\(K\)中,節點\(v \in V\)的 representation \(h^{(K)}_v\) 就可以用來預測其標籤值\(l_v\)了,使用最後的轉換函式(記為\(out\)),最終:

04 A Bipartite Graph for Column Selection

利用上面的GNN來做Column Selection,比較明顯的做法是一個節點表示一個column,然後將兩個column通過一條邊連線如果他們都與某個約束相關聯的話。但是這樣會導致大量的邊,並且對偶值的資訊也很難在模型中進行表示。

因此作者使用了bipartite graph with two node types:column nodes \(V\) and constraint
nodes \(C\). An edge \((v, c)\) exists between a node \(v \in V\) and a node \(c \in C\) if column \(v\) contributes to constraint \(c\). 這樣做的好處是可以在節點\(c\)上附加特徵向量,例如對偶解的資訊,如下圖(a)所示:

因為有兩種型別的節點,每一次迭代時均有兩階段:階段1更新Constraint節點\(c \in C\)(上圖(b)),階段2更新Column節點\(v \in V\)(上圖(c))。最終,column節點的 representations \(h^{(K)}_v, v \in V\)被用來預測節點的labels \(y_v \in \{0, 1\}\)。演算法的流程如下:

As described in the previous section, we start by initializing the representation vectors of both the column and constraint nodes by the feature vectors \(x_v\) and \(x_c\), respectively (steps 1 and 2). For each iteration \(k\), we perform the two phases: updating the constraint representations (steps 4 and 5), then the column ones (steps 6 and 7). The sum function is used for the aggr function and the vector concatenation for the comb function.

The functions \(\phi^{(k)}_C, \psi^{(k)}_C, \phi^{(k)}_V\), and \(\psi^{(k)}_V\) are two-layer feed forward neural networks with rectified linear unit (ReLU) activation functions and out is a three-layer feed forward neural network with a sigmoid function for producing the final probabilities (step 9).

A weighted binary cross entropy loss is used to evaluate the performance of the model, where the weights are used to deal with the imbalance between the two classes. Indeed, about 90% of the columns belong to the unselected class, that is, their label \(y_v = 0\).

05 資料採集

資料通過使用前面提到的MILP求解多個算例來採集column的labels。每次的列生成迭代都將構建一個Bipartite Graph並且儲存以下的資訊:

  • The sets of column and constraint nodes;
  • A sparse matrix \(E \in \mathbb{R}^{n\times m}\) storing the edges;
  • A column feature matrix \(X^V \in \mathbb{R}^{n\times d}\), where \(n\) is the number of columns and d the number of column features;
  • A constraint feature matrix \(X^C \in \mathbb{R}^{n\times p}\), where \(m\) is the number of constraints and \(p\) the number of constraint features;
  • The label vector y of the newly generated
    columns in \(\mathcal{G}‘\).

06 Case Study I: Vehicle and Crew Scheduling Problem

關於這個問題的定義就不寫了,大家可以自行去了解一下。

6.1 MILP Performance

通過刻畫在列生成中使用MILP和不使用MILP(所有生成的檢驗數為負的column都丟進下一次的RMP裡)的收斂圖如下:

使用了MILP收斂速度反而有所下降,This is mainly due to the rejected columns that remain with a negative reduced cost after the RMP reoptimization and keep on being generated in subsequent iterations, even though they do not improve the objective value (degeneracy).

為了解決這個問題,一個可行的方法是執行MILP以後,額外再新增一些column。不過是先將MILP選出來的column加進RMP,進行求解,得到duals以後,再去未被選中的column中判斷,哪些column在新的duals下檢驗數依然為負,然後進行新增。當然,未選中的column過多的話,不一定把所有的都加進去,按照檢驗數排序,加一部分就好(該文是50%)。如下圖所示:

加入了額外的column以後,在進行preliminary的測試,結果如下(the computational time of the algorithm with column selection does not include the time spent solving the MILP at every iteration,因為作者只想瞭解selection對column generation的影響,反正MILP最後要被更快的GNN模型替代的。):

可以看到,MILP能節省更多的計算時間,減少約34%的總體時間。

6.2 Comparison

隨後,對以下的策略進行對比:

  • No selection (NO-S): This is the standard CG algorithm with no selection involved, with the use of the acceleration strategies described in Section 2.
  • MILP selection (MILP-S): The MILP is used to select the columns at each iteration, with 50% additional columns to avoid convergence issues. Because the MILP is considered to be the expert we want to learn from and we are looking to replace it with a fast approximation, the total computational time does not include the time spent solving the MILP.
  • GNN selection (GNN-S): The learned model is used to select the columns. At every CG iteration, the column features are extracted, the predictions are obtained, and the selected columns are added to the RMP.
  • Sorting selection (Sort-S): The generated columns are sorted by reduced cost in ascending order, and a subset of the columns with the lowest reduced cost are selected. The number of columns selected is on average the same as with the GNN selection.
  • Random selection (Rand-S): A subset of the columns is selected randomly. The number of columns selected is on average the same as with the GNN selection

對比的結果如下,其中The time reduction column compares the GNN-S to the NO-S algorithm.相比平均減少26%的時間。

07 Case Study II: Vehicle Routing Problem with Time Windows

這是大家的老熟客了,就不過多介紹了。直接看對比的結果:

The last column corresponds to the time reduction when comparing GNN-S with NO-S. One can see that the column selection with the GNN model gives positive results, yielding average reductions ranging from 20%–29%. These reductions could have been larger if the number of CG iterations performed had not increased.

參考文獻

  • [1] Mouad Morabit, Guy Desaulniers, Andrea Lodi (2021) Machine-Learning–Based Column Selection for Column Generation. Transportation Science