文字檢測網路EAST學習(二)
EAST是曠視科技在2017年論文East: An Efficient and Accurate Scene Text Detector中提出,能檢測任意角度的文字,速度和準確度都很有優勢。
East算是一篇很有特色的文章,還是從網路設計,GroundTruth生成,loss函式和Locality-Aware NMS(後處理)四部分來學習下。
1.網路設計
East論文中網路結構如下圖所示,採用PVANet提取特徵,將不同層的特徵進行上取樣合併,隨後預測最後的score和box。關於box的表示方式,論文中提出了兩種方法,即RBOX和QUAD,若box資料採用RBOX形式標註,模型最後預測1個chanel的score_map和4個channel的box_map; 若box資料採用QUAD的形式標註,模型最後預測1個chanel的score_map和8個channel的box_map.
實際工作中,我主要用到Resnet作為backbone的East網路,並使用RBOX形式的標註框,下面是具體的網路結構如下圖所示,訓練過程中網路的資料流總結如下:
-
-
f1上取樣後和f2進行concat,隨後經過1x1,3x3的卷積得到h2(1x128x32x32), 同樣的h2上取樣,和f3進行concat,卷積得到h3(1x64x64x64), 最後h3上取樣,和f4進行concat,卷積得到h4(1x32x128x128)
-
geo_map = self.sigmoid2(geo_map) * 512 (輸入圖片尺寸為512,變化到畫素值) angle_map = (angle_map - 0.5) * math.pi / 2 (變化到[-Π/2, Π/2]之間)
2. GroundTruth生成
2.1 GroundTruth含義理解
上述提到了box的標註有兩種形式RBOX和QUAD,其GroundTruth也不一樣
RBOX
RBOX的GroundTruth包括score_map,geo_map和angle_map。score map文字框區域的畫素值為1,其他非文字框區域值為0,如下圖中(b)所示。geo_map的文字區域中每個畫素點都包含4個值,即畫素點到文字框上,下,左,右的距離,如下面示意圖中,圖(d)中深藍/黃/紅/綠分別表示這個畫素點到上,下,左,右的距離。圖(e)是angle_map ,表示文字框的旋轉角度angle。特別注意,這裡考慮文字區域中每個畫素點到文字框的距離,其他非文字框區域的畫素點的這5個值置為0,最後得到的是WxHx4大小的geo_map和WxHx1的angle_map),W和H分別表示原始圖片的寬和高。(注意的是,這裡的文字框都是實際文字框的縮小版)
QUAD
QUAD的GroundTruth包括score_map和geo_map,其score_map和RBOX一樣,box標記出文本所在框的四個角點座標 ,這個無需做額外處理,geo_ma的文字區域中每個畫素點包含8個值,為四個角點座標的集合。
2.2 GroundTruth相關程式碼理解
在產生geo_map和angle_map時,有很多程式碼不是很好理解,值得說明下。
polygon_area()函式
主要是用來驗證box四個座標是否按順時針排序,若按逆時針排序,需要轉換為順時針排序,其原理是利用了鞋帶定理。鞋帶定理(Shoelace Theorem)能根據多邊形的頂點座標,計算任意多邊形的面積,座標順時針排列時為負數,逆時針排列時為正數。(鞋帶定理:https://zhuanlan.zhihu.com/p/110025234)
def polygon_area(poly): ''' compute area of a polygon :param poly: :return: ''' edge = [ (poly[1][0] - poly[0][0]) * (poly[1][1] + poly[0][1]), (poly[2][0] - poly[1][0]) * (poly[2][1] + poly[1][1]), (poly[3][0] - poly[2][0]) * (poly[3][1] + poly[2][1]), (poly[0][0] - poly[3][0]) * (poly[0][1] + poly[3][1]) ] return np.sum(edge)/2. def check_and_validate_polys(polys, tags, size): ''' check so that the text poly is in the same direction, and also filter some invalid polygons :param polys: :param tags: :return: ''' (h, w) = size if polys.shape[0] == 0: return polys polys[:, :, 0] = np.clip(polys[:, :, 0], 0, w-1) polys[:, :, 1] = np.clip(polys[:, :, 1], 0, h-1) validated_polys = [] validated_tags = [] for poly, tag in zip(polys, tags): p_area = polygon_area(poly) if abs(p_area) < 1: # print poly print('invalid poly') continue if p_area > 0: print('poly in wrong direction') poly = poly[(0, 3, 2, 1), :] validated_polys.append(poly) validated_tags.append(tag) return np.array(validated_polys), np.array(validated_tags)polygon_area()
判斷多邊形排序應用:
#鞋帶定理(Shoelace Theorem)能根據多邊形的頂點座標,計算任意多邊形的面積,座標順時針排列時為負數,逆時針排列時為正數 def validate_clockwise_points(points): #順時針排序時報錯 """ Validates that the points that the 4 points that dlimite a polygon are in counter_clockwise order. """ if len(points) != 8: raise Exception("Points list not valid." + str(len(points))) point = [ [int(points[0]), int(points[1])], [int(points[2]), int(points[3])], [int(points[4]), int(points[5])], [int(points[6]), int(points[7])] ] edge = [ (point[1][0] - point[0][0]) * (point[1][1] + point[0][1]), (point[2][0] - point[1][0]) * (point[2][1] + point[1][1]), (point[3][0] - point[2][0]) * (point[3][1] + point[2][1]), (point[0][0] - point[3][0]) * (point[0][1] + point[3][1]) ] summatory = edge[0] + edge[1] + edge[2] + edge[3] if summatory < 0: raise Exception("Points are not counter_clockwise.")多邊形頂點排序
point_dist_to_line()函式
np.cross表示向量的叉積,而向量的叉積表示這兩個向量形成的平行四邊形的面積,面積除以底邊得到高,即p3到p1p2邊的距離
def point_dist_to_line(p1, p2, p3): # compute the distance from p3 to p1-p2 return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
generate_rbox()函式
這個函式最複雜,其中計算包圍box最小矩形的程式碼比較難理解,大致流程就是從每個頂點出發,找到對應的平行四邊形及其矩形,然後比較所有矩形的面積,取面積最小的矩形,如下圖所示:
generate_rbox的程式碼如下:
def generate_rbox(im_size, polys, tags): h, w = im_size poly_mask = np.zeros((h, w), dtype=np.uint8) score_map = np.zeros((h, w), dtype=np.uint8) geo_map = np.zeros((h, w, 5), dtype=np.float32) # mask used during traning, to ignore some hard areas training_mask = np.ones((h, w), dtype=np.uint8) for poly_idx, poly_tag in enumerate(zip(polys, tags)): poly = poly_tag[0] tag = poly_tag[1] r = [None, None, None, None] for i in range(4): r[i] = min(np.linalg.norm(poly[i] - poly[(i + 1) % 4]), np.linalg.norm(poly[i] - poly[(i - 1) % 4])) # score map shrinked_poly = shrink_poly(poly.copy(), r).astype(np.int32)[np.newaxis, :, :] cv2.fillPoly(score_map, shrinked_poly, 1) cv2.fillPoly(poly_mask, shrinked_poly, poly_idx + 1) # if the poly is too small, then ignore it during training poly_h = min(np.linalg.norm(poly[0] - poly[3]), np.linalg.norm(poly[1] - poly[2])) poly_w = min(np.linalg.norm(poly[0] - poly[1]), np.linalg.norm(poly[2] - poly[3])) if min(poly_h, poly_w) < FLAGS.min_text_size: cv2.fillPoly(training_mask, poly.astype(np.int32)[np.newaxis, :, :], 0) if tag: cv2.fillPoly(training_mask, poly.astype(np.int32)[np.newaxis, :, :], 0) xy_in_poly = np.argwhere(poly_mask == (poly_idx + 1)) # if geometry == 'RBOX': # 對任意兩個頂點的組合生成一個平行四邊形 - generate a parallelogram for any combination of two vertices fitted_parallelograms = [] for i in range(4): p0 = poly[i] p1 = poly[(i + 1) % 4] p2 = poly[(i + 2) % 4] p3 = poly[(i + 3) % 4] edge = fit_line([p0[0], p1[0]], [p0[1], p1[1]]) #直線p0p1 backward_edge = fit_line([p0[0], p3[0]], [p0[1], p3[1]]) #直線p0p3 forward_edge = fit_line([p1[0], p2[0]], [p1[1], p2[1]]) #直線p1p2 if point_dist_to_line(p0, p1, p2) > point_dist_to_line(p0, p1, p3): #p2到直線p0p1的距離大於p3到p0p1的距離 # 平行線經過p2 - parallel lines through p2 if edge[1] == 0: #經過p2平行於p0p1的直線 edge_opposite = [1, 0, -p2[0]] else: edge_opposite = [edge[0], -1, p2[1] - edge[0] * p2[0]] else: # 經過p3 - after p3 if edge[1] == 0: #經過p3平行於p0p1的直線 edge_opposite = [1, 0, -p3[0]] else: edge_opposite = [edge[0], -1, p3[1] - edge[0] * p3[0]] # move forward edge new_p0 = p0 new_p1 = p1 new_p2 = p2 new_p3 = p3 new_p2 = line_cross_point(forward_edge, edge_opposite) #直線forward_edge和直線edge_opposite的交點 if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3): # across p0 if forward_edge[1] == 0: #經過p0,平行於forward_edge的直線 forward_opposite = [1, 0, -p0[0]] else: forward_opposite = [forward_edge[0], -1, p0[1] - forward_edge[0] * p0[0]] else: # across p3 if forward_edge[1] == 0: #經過p3,平行於forward_edge的直線 forward_opposite = [1, 0, -p3[0]] else: forward_opposite = [forward_edge[0], -1, p3[1] - forward_edge[0] * p3[0]] new_p0 = line_cross_point(forward_opposite, edge) #直線forward_opposite和直線edge的交點 new_p3 = line_cross_point(forward_opposite, edge_opposite) #直線forward_opposite和直線edge_opposite的交點 fitted_parallelograms.append([new_p0, new_p1, new_p2, new_p3, new_p0]) # or move backward edge new_p0 = p0 new_p1 = p1 new_p2 = p2 new_p3 = p3 new_p3 = line_cross_point(backward_edge, edge_opposite) if point_dist_to_line(p0, p3, p1) > point_dist_to_line(p0, p3, p2): # across p1 if backward_edge[1] == 0: backward_opposite = [1, 0, -p1[0]] else: backward_opposite = [backward_edge[0], -1, p1[1] - backward_edge[0] * p1[0]] else: # across p2 if backward_edge[1] == 0: backward_opposite = [1, 0, -p2[0]] else: backward_opposite = [backward_edge[0], -1, p2[1] - backward_edge[0] * p2[0]] new_p1 = line_cross_point(backward_opposite, edge) new_p2 = line_cross_point(backward_opposite, edge_opposite) fitted_parallelograms.append([new_p0, new_p1, new_p2, new_p3, new_p0]) areas = [Polygon(t).area for t in fitted_parallelograms] parallelogram = np.array(fitted_parallelograms[np.argmin(areas)][:-1], dtype=np.float32) # sort thie polygon parallelogram_coord_sum = np.sum(parallelogram, axis=1) min_coord_idx = np.argmin(parallelogram_coord_sum) parallelogram = parallelogram[ [min_coord_idx, (min_coord_idx + 1) % 4, (min_coord_idx + 2) % 4, (min_coord_idx + 3) % 4]] rectange = rectangle_from_parallelogram(parallelogram) rectange, rotate_angle = sort_rectangle(rectange) p0_rect, p1_rect, p2_rect, p3_rect = rectange for y, x in xy_in_poly: point = np.array([x, y], dtype=np.float32) # top geo_map[y, x, 0] = point_dist_to_line(p0_rect, p1_rect, point) # right geo_map[y, x, 1] = point_dist_to_line(p1_rect, p2_rect, point) # down geo_map[y, x, 2] = point_dist_to_line(p2_rect, p3_rect, point) # left geo_map[y, x, 3] = point_dist_to_line(p3_rect, p0_rect, point) # angle geo_map[y, x, 4] = rotate_angle return score_map, geo_map, training_maskgenerate_rbox
3. loss函式
損失函式包括兩部分,score_map的的分類任務損失和geo_map,angle_map的迴歸損失,論文中總損失計算如下:
分類損失
score_map中文字所在區域的畫素點值為1,背景區域的畫素點值為0,是一個二分類問題,由於類別平衡,論文中使用類平衡的交叉熵損失(class-balanced cross-entropy)
很多實現程式碼中都使用dice loss代替了類平衡損失,dice loss的實現程式碼如下:
def dice_coefficient(y_true_cls, y_pred_cls, training_mask): ''' dice loss :param y_true_cls: :param y_pred_cls: :param training_mask: :return: ''' eps = 1e-5 intersection =torch.sum(y_true_cls * y_pred_cls * training_mask) union = torch.sum(y_true_cls * training_mask) + torch.sum(y_pred_cls * training_mask) + eps loss = 1. - (2 * intersection / union)
迴歸損失
RBOX損失的計算,包括box位置geo_map損失和box角度angle_map的損失,box位置採用了比較有特色的IOU Loss, 即gt框和預測框的交併比,如下面等式
box的角度損失採用了餘弦角度差損失,如下面等式
總的RBOX損失值如下
總的loss函式的實現程式碼如下:
import torch import torch.nn as nn def dice_coefficient(y_true_cls, y_pred_cls, training_mask): ''' dice loss :param y_true_cls: :param y_pred_cls: :param training_mask: :return: ''' eps = 1e-5 intersection =torch.sum(y_true_cls * y_pred_cls * training_mask) union = torch.sum(y_true_cls * training_mask) + torch.sum(y_pred_cls * training_mask) + eps loss = 1. - (2 * intersection / union) return loss class LossFunc(nn.Module): def __init__(self): super(LossFunc, self).__init__() return def forward(self, y_true_cls, y_pred_cls, y_true_geo, y_pred_geo, training_mask): classification_loss = dice_coefficient(y_true_cls, y_pred_cls, training_mask) # scale classification loss to match the iou loss part classification_loss *= 0.01 # d1 -> top, d2->right, d3->bottom, d4->left # d1_gt, d2_gt, d3_gt, d4_gt, theta_gt = tf.split(value=y_true_geo, num_or_size_splits=5, axis=3) d1_gt, d2_gt, d3_gt, d4_gt, theta_gt = torch.split(y_true_geo, 1, 1) # d1_pred, d2_pred, d3_pred, d4_pred, theta_pred = tf.split(value=y_pred_geo, num_or_size_splits=5, axis=3) d1_pred, d2_pred, d3_pred, d4_pred, theta_pred = torch.split(y_pred_geo, 1, 1) area_gt = (d1_gt + d3_gt) * (d2_gt + d4_gt) area_pred = (d1_pred + d3_pred) * (d2_pred + d4_pred) w_union = torch.min(d2_gt, d2_pred) + torch.min(d4_gt, d4_pred) h_union = torch.min(d1_gt, d1_pred) + torch.min(d3_gt, d3_pred) area_intersect = w_union * h_union area_union = area_gt + area_pred - area_intersect L_AABB = -torch.log((area_intersect + 1.0)/(area_union + 1.0)) L_theta = 1 - torch.cos(theta_pred - theta_gt) L_g = L_AABB + 20 * L_theta return torch.mean(L_g * y_true_cls * training_mask) + classification_lossEastLoss
4. Locality-Aware NMS(後處理)
在測試階段,需要根據score_map和geo_map得到最後的檢測框box,流程如下:
-
選取score_map中預測分數大於score_map_thresh的區域,作為可能的文字檢測區域
-
根據篩選後的score_map和geo_map, 將RBOXA,A,B,B,angle)的文字框表示形式轉成QUAD的形式
-
所有座標點按照y座標,對於y座標相鄰兩個box進行weighted_merge(以分數為權重進行合併)
-
根據score排序,並做NMS,過濾多餘文字框。
將RBOX形式轉換為QUAD的邏輯,程式碼中採用函式restore_rectangle_rbox()實現,其邏輯是:對於文字區域中的每一個畫素點,先旋轉矩陣計算得到旋轉後的座標,再平移到該畫素點即可,如下圖所示:
restore_rectangle_rbox()程式碼如下:
def restore_rectangle_rbox(origin, geometry): # origin:是所有文字區域點的座標,(x, y)形式 # geometry:是origin中每個點對應四邊的距離和角度[A, A, B, B, angle] d = geometry[:, :4] # 四邊距離[A, A, B, B] angle = geometry[:, 4] # 角度angle # for angle > 0 origin_0 = origin[angle >= 0] d_0 = d[angle >= 0] angle_0 = angle[angle >= 0] if origin_0.shape[0] > 0: p = np.array([np.zeros(d_0.shape[0]), -d_0[:, 0] - d_0[:, 2], d_0[:, 1] + d_0[:, 3], -d_0[:, 0] - d_0[:, 2], d_0[:, 1] + d_0[:, 3], np.zeros(d_0.shape[0]), np.zeros(d_0.shape[0]), np.zeros(d_0.shape[0]), d_0[:, 3], -d_0[:, 2]]) p = p.transpose((1, 0)).reshape((-1, 5, 2)) # N*5*2 rotate_matrix_x = np.array([np.cos(angle_0), np.sin(angle_0)]).transpose((1, 0)) rotate_matrix_x = np.repeat(rotate_matrix_x, 5, axis=1).reshape(-1, 2, 5).transpose((0, 2, 1)) # N*5*2 rotate_matrix_y = np.array([-np.sin(angle_0), np.cos(angle_0)]).transpose((1, 0)) rotate_matrix_y = np.repeat(rotate_matrix_y, 5, axis=1).reshape(-1, 2, 5).transpose((0, 2, 1)) p_rotate_x = np.sum(rotate_matrix_x * p, axis=2)[:, :, np.newaxis] # N*5*1 p_rotate_y = np.sum(rotate_matrix_y * p, axis=2)[:, :, np.newaxis] # N*5*1 p_rotate = np.concatenate([p_rotate_x, p_rotate_y], axis=2) # N*5*2 p3_in_origin = origin_0 - p_rotate[:, 4, :] new_p0 = p_rotate[:, 0, :] + p3_in_origin # N*2 new_p1 = p_rotate[:, 1, :] + p3_in_origin new_p2 = p_rotate[:, 2, :] + p3_in_origin new_p3 = p_rotate[:, 3, :] + p3_in_origin new_p_0 = np.concatenate([new_p0[:, np.newaxis, :], new_p1[:, np.newaxis, :], new_p2[:, np.newaxis, :], new_p3[:, np.newaxis, :]], axis=1) # N*4*2 else: new_p_0 = np.zeros((0, 4, 2)) # for angle < 0 origin_1 = origin[angle < 0] d_1 = d[angle < 0] angle_1 = angle[angle < 0] if origin_1.shape[0] > 0: p = np.array([-d_1[:, 1] - d_1[:, 3], -d_1[:, 0] - d_1[:, 2], np.zeros(d_1.shape[0]), -d_1[:, 0] - d_1[:, 2], np.zeros(d_1.shape[0]), np.zeros(d_1.shape[0]), -d_1[:, 1] - d_1[:, 3], np.zeros(d_1.shape[0]), -d_1[:, 1], -d_1[:, 2]]) p = p.transpose((1, 0)).reshape((-1, 5, 2)) # N*5*2 rotate_matrix_x = np.array([np.cos(-angle_1), -np.sin(-angle_1)]).transpose((1, 0)) rotate_matrix_x = np.repeat(rotate_matrix_x, 5, axis=1).reshape(-1, 2, 5).transpose((0, 2, 1)) # N*5*2 rotate_matrix_y = np.array([np.sin(-angle_1), np.cos(-angle_1)]).transpose((1, 0)) rotate_matrix_y = np.repeat(rotate_matrix_y, 5, axis=1).reshape(-1, 2, 5).transpose((0, 2, 1)) p_rotate_x = np.sum(rotate_matrix_x * p, axis=2)[:, :, np.newaxis] # N*5*1 p_rotate_y = np.sum(rotate_matrix_y * p, axis=2)[:, :, np.newaxis] # N*5*1 p_rotate = np.concatenate([p_rotate_x, p_rotate_y], axis=2) # N*5*2 p3_in_origin = origin_1 - p_rotate[:, 4, :] new_p0 = p_rotate[:, 0, :] + p3_in_origin # N*2 new_p1 = p_rotate[:, 1, :] + p3_in_origin new_p2 = p_rotate[:, 2, :] + p3_in_origin new_p3 = p_rotate[:, 3, :] + p3_in_origin new_p_1 = np.concatenate([new_p0[:, np.newaxis, :], new_p1[:, np.newaxis, :], new_p2[:, np.newaxis, :], new_p3[:, np.newaxis, :]], axis=1) # N*4*2 else: new_p_1 = np.zeros((0, 4, 2)) return np.concatenate([new_p_0, new_p_1])restore_rectangle_rbox
locality-aware NMS就是在NMS之前,對於y座標相鄰很近的box先進行一次合併,然後再進行NMS,其中合併採用了weigthed_merge方法,需要注意下,python示例程式碼如下:
import numpy as np from shapely.geometry import Polygon def intersection(g, p): g = Polygon(g[:8].reshape((4, 2))) p = Polygon(p[:8].reshape((4, 2))) if not g.is_valid or not p.is_valid: return 0 inter = Polygon(g).intersection(Polygon(p)).area union = g.area + p.area - inter if union == 0: return 0 else: return inter/union def weighted_merge(g, p): g[:8] = (g[8] * g[:8] + p[8] * p[:8])/(g[8] + p[8]) g[8] = (g[8] + p[8]) return g def standard_nms(S, thres): order = np.argsort(S[:, 8])[::-1] keep = [] while order.size > 0: i = order[0] keep.append(i) ovr = np.array([intersection(S[i], S[t]) for t in order[1:]]) inds = np.where(ovr <= thres)[0] order = order[inds+1] return S[keep] def nms_locality(polys, thres=0.3): ''' locality aware nms of EAST :param polys: a N*9 numpy array. first 8 coordinates, then prob :return: boxes after nms ''' S = [] p = None for g in polys: if p is not None and intersection(g, p) > thres: p = weighted_merge(g, p) else: if p is not None: S.append(p) p = g if p is not None: S.append(p) if len(S) == 0: return np.array([]) return standard_nms(np.array(S), thres) if __name__ == '__main__': # 343,350,448,135,474,143,369,359 print(Polygon(np.array([[343, 350], [448, 135], [474, 143], [369, 359]])).area)locality_aware_nms
參考文章:
https://www.cnblogs.com/lillylin/p/9954981.html
https://zhuanlan.zhihu.com/p/71182747
https://blog.csdn.net/sxlsxl119/article/details/103934957