1. 程式人生 > 其它 >無監督深度估計、運動估計的深度學習方法(二)

無監督深度估計、運動估計的深度學習方法(二)

技術標籤:視覺里程計depth estimation深度學習自動駕駛計算機視覺

在自監督深度估計中,一般輸入2張影象(若為視訊,則輸入鄰近的兩幀影象)frame1和frame2,模型先估計相機拍攝這2張影象是的姿態變化pose,然後根據pose將frame1變換到frame2的視角下,得到合成影象synthetic frame1。

估算的pose越準確,synthetic frame1與frame2的影象相似度就越高。那麼,常用SSIM(結構相似性)來評價這兩張圖片的相似度。

若希望詳細瞭解單目深度估計,可參考文章《動態場景下的單目深度估計》、《Instance-wise Depth and Motion Learning from Monocular Videos

》和《MonoDepth2_單目深度估計》。

SSIM

用於檢測兩張尺寸相同的影象的相似度,它主要通過分別比較兩個影象的亮度(l)、對比度(c)、結構(s),然後對這三個要素加權並乘積表示,在論文中這三個要素用下面公式來表示:

這裡μx 為均值,σ為方差,σxy表示協方差。這裡 C1、C2、C3是為了避免當分母為 0 時造成的不穩定問題(所以寫演算法的時候可以放心,一定不會出現除 0 的情況)。

而 SSIM 的一般方程為:

這裡一般\alpha,\beta,\gamma取1,並且令 C3=0.5*C2,這樣就得到簡化的 SSIM 公式:

c_1=(k_1*L)^2c_2=(k_2*L)^2是用來維持穩定的常數。L是畫素值的動態範圍。k_1=0.01

,k_2=0.03
結構相似性的範圍為-1到+1(即SSIM∈(-1, 0])。當兩張影象一模一樣時,SSIM的值等於1。

tensorflow實現程式碼

def weighted_ssim(x, y, weight, c1=0.01**2, c2=0.03**2, weight_epsilon=0.01):
  """Computes a weighted structured image similarity measure.

  See https://en.wikipedia.org/wiki/Structural_similarity#Algorithm. The only
  difference here is that not all pixels are weighted equally when calculating
  the moments - they are weighted by a weight function.

  Args:
    x: A tf.Tensor representing a batch of images, of shape [B, H, W, C].
    y: A tf.Tensor representing a batch of images, of shape [B, H, W, C].
    weight: A tf.Tensor of shape [B, H, W], representing the weight of each
      pixel in both images when we come to calculate moments (means and
      correlations).
    c1: A floating point number, regularizes division by zero of the means.
    c2: A floating point number, regularizes division by zero of the second
      moments.
    weight_epsilon: A floating point number, used to regularize division by the
      weight.

  Returns:
    A tuple of two tf.Tensors. First, of shape [B, H-2, W-2, C], is scalar
    similarity loss oer pixel per channel, and the second, of shape
    [B, H-2. W-2, 1], is the average pooled `weight`. It is needed so that we
    know how much to weigh each pixel in the first tensor. For example, if
    `'weight` was very small in some area of the images, the first tensor will
    still assign a loss to these pixels, but we shouldn't take the result too
    seriously.
  """
  if c1 == float('inf') and c2 == float('inf'):
    raise ValueError('Both c1 and c2 are infinite, SSIM loss is zero. This is '
                     'likely unintended.')
  weight = tf.expand_dims(weight, -1)
  average_pooled_weight = _avg_pool3x3(weight)
  weight_plus_epsilon = weight + weight_epsilon
  inverse_average_pooled_weight = 1.0 / (average_pooled_weight + weight_epsilon)

  def _avg_pool3x3(x):
    return tf.nn.avg_pool(x, [1, 3, 3, 1], [1, 1, 1, 1], 'VALID')

  def weighted_avg_pool3x3(z):
    wighted_avg = _avg_pool3x3(z * weight_plus_epsilon)
    return wighted_avg * inverse_average_pooled_weight

  mu_x = weighted_avg_pool3x3(x)
  mu_y = weighted_avg_pool3x3(y)
  sigma_x = weighted_avg_pool3x3(x**2) - mu_x**2
  sigma_y = weighted_avg_pool3x3(y**2) - mu_y**2
  sigma_xy = weighted_avg_pool3x3(x * y) - mu_x * mu_y
  if c1 == float('inf'):
    ssim_n = (2 * sigma_xy + c2)
    ssim_d = (sigma_x + sigma_y + c2)
  elif c2 == float('inf'):
    ssim_n = 2 * mu_x * mu_y + c1
    ssim_d = mu_x**2 + mu_y**2 + c1
  else:
    ssim_n = (2 * mu_x * mu_y + c1) * (2 * sigma_xy + c2)
    ssim_d = (mu_x**2 + mu_y**2 + c1) * (sigma_x + sigma_y + c2)
  result = ssim_n / ssim_d
  return tf.clip_by_value((1 - result) / 2, 0, 1), average_pooled_weight

感興趣的同學,歡迎掃碼關注同名公眾號喲!