無監督深度估計、運動估計的深度學習方法(二)
阿新 • • 發佈:2020-12-14
技術標籤:視覺里程計depth estimation深度學習自動駕駛計算機視覺
在自監督深度估計中,一般輸入2張影象(若為視訊,則輸入鄰近的兩幀影象)frame1和frame2,模型先估計相機拍攝這2張影象是的姿態變化pose,然後根據pose將frame1變換到frame2的視角下,得到合成影象synthetic frame1。
估算的pose越準確,synthetic frame1與frame2的影象相似度就越高。那麼,常用SSIM(結構相似性)來評價這兩張圖片的相似度。
若希望詳細瞭解單目深度估計,可參考文章《動態場景下的單目深度估計》、《Instance-wise Depth and Motion Learning from Monocular Videos
用於檢測兩張尺寸相同的影象的相似度,它主要通過分別比較兩個影象的亮度(l)、對比度(c)、結構(s),然後對這三個要素加權並乘積表示,在論文中這三個要素用下面公式來表示:
這裡μx 為均值,σ為方差,σxy表示協方差。這裡 C1、C2、C3是為了避免當分母為 0 時造成的不穩定問題(所以寫演算法的時候可以放心,一定不會出現除 0 的情況)。
而 SSIM 的一般方程為:
這裡一般,,取1,並且令 C3=0.5*C2,這樣就得到簡化的 SSIM 公式:
c_1=(k_1*L)^2
,c_2=(k_2*L)^2
是用來維持穩定的常數。L
是畫素值的動態範圍。k_1=0.01
k_2=0.03
。結構相似性的範圍為-1到+1(即SSIM∈(-1, 0])。當兩張影象一模一樣時,SSIM的值等於1。
tensorflow實現程式碼
def weighted_ssim(x, y, weight, c1=0.01**2, c2=0.03**2, weight_epsilon=0.01): """Computes a weighted structured image similarity measure. See https://en.wikipedia.org/wiki/Structural_similarity#Algorithm. The only difference here is that not all pixels are weighted equally when calculating the moments - they are weighted by a weight function. Args: x: A tf.Tensor representing a batch of images, of shape [B, H, W, C]. y: A tf.Tensor representing a batch of images, of shape [B, H, W, C]. weight: A tf.Tensor of shape [B, H, W], representing the weight of each pixel in both images when we come to calculate moments (means and correlations). c1: A floating point number, regularizes division by zero of the means. c2: A floating point number, regularizes division by zero of the second moments. weight_epsilon: A floating point number, used to regularize division by the weight. Returns: A tuple of two tf.Tensors. First, of shape [B, H-2, W-2, C], is scalar similarity loss oer pixel per channel, and the second, of shape [B, H-2. W-2, 1], is the average pooled `weight`. It is needed so that we know how much to weigh each pixel in the first tensor. For example, if `'weight` was very small in some area of the images, the first tensor will still assign a loss to these pixels, but we shouldn't take the result too seriously. """ if c1 == float('inf') and c2 == float('inf'): raise ValueError('Both c1 and c2 are infinite, SSIM loss is zero. This is ' 'likely unintended.') weight = tf.expand_dims(weight, -1) average_pooled_weight = _avg_pool3x3(weight) weight_plus_epsilon = weight + weight_epsilon inverse_average_pooled_weight = 1.0 / (average_pooled_weight + weight_epsilon) def _avg_pool3x3(x): return tf.nn.avg_pool(x, [1, 3, 3, 1], [1, 1, 1, 1], 'VALID') def weighted_avg_pool3x3(z): wighted_avg = _avg_pool3x3(z * weight_plus_epsilon) return wighted_avg * inverse_average_pooled_weight mu_x = weighted_avg_pool3x3(x) mu_y = weighted_avg_pool3x3(y) sigma_x = weighted_avg_pool3x3(x**2) - mu_x**2 sigma_y = weighted_avg_pool3x3(y**2) - mu_y**2 sigma_xy = weighted_avg_pool3x3(x * y) - mu_x * mu_y if c1 == float('inf'): ssim_n = (2 * sigma_xy + c2) ssim_d = (sigma_x + sigma_y + c2) elif c2 == float('inf'): ssim_n = 2 * mu_x * mu_y + c1 ssim_d = mu_x**2 + mu_y**2 + c1 else: ssim_n = (2 * mu_x * mu_y + c1) * (2 * sigma_xy + c2) ssim_d = (mu_x**2 + mu_y**2 + c1) * (sigma_x + sigma_y + c2) result = ssim_n / ssim_d return tf.clip_by_value((1 - result) / 2, 0, 1), average_pooled_weight
感興趣的同學,歡迎掃碼關注同名公眾號喲!