[機器視覺] 實際場景字提取
阿新 • • 發佈:2019-01-22
gauss 進行 技術 win extern 老師 gre 通過 while
手機場景下字體提取
簡介
這是老師突然給的任務,不做吧又不好,做唄。實際做的過程中讓我對形態學處理有了新的認識,我真沒想到形態學處理這麽強大,同時也深化了我對sobel算子的理解和記憶。。強大!
處理過程供分為兩步,
- ROI的獲取並矯正
- 字提取
ROI提取
思路大致是這樣的,由於在原圖中存在矩形表格框,所以只要能夠拿到矩形表格框下的ROI,然後在ROI下提取文字,那麽處理起來應該會方便很多,面臨的問題主要有特定方向邊界的提取、矩形擬合、關鍵點如何變換的問題。
對於特定方向邊界的提取,我采用的是sobel算子在高斯平滑和中值濾波之後進行提取,單獨提取x方向和y方向邊界,然後按照1:1權重加和,期間kernel size需要調參。二值化,經過形態學膨脹和腐蝕處理將噪聲點去掉,然後找contour。
對於舉行擬合,計算每個contour的面積,面積滿足一定閾值留下,留下的contour使用多邊形擬合得到多邊形邊界,得到的候選多邊形邊界類似於舉行,但仍然有幹擾點,通過簡單的算法得到舉行的四個定點,於是得到擬合後的矩形。
對於關鍵點變換,由於得到的舉行可能是經過翻折、旋轉等線性變換的樣子,我們需要將舉行“擺正”,通過構造變換矩陣,進行線性變換,將舉行映射到一張正視圖中。
對每個contour執行上述操作,即得到ROI。需要註意的是,應對不同信息需要調參。
# w:240*6 h:160*6 def getROI(frame): while True: out_imgs = [] src = copy.copy(frame) thre = cv.getTrackbarPos("thre","Trackbar") max_e = cv.getTrackbarPos("max_e","Trackbar") min_e = cv.getTrackbarPos("min_e","Trackbar") gray = cv.cvtColor(src,cv.COLOR_BGR2GRAY) gaussian = cv.GaussianBlur(gray,(3,3),0,0,cv.BORDER_DEFAULT) median = cv.medianBlur(gaussian,5) x = cv.Sobel(median,cv.CV_8U, 1, 0, ksize = 3) y = cv.Sobel(median,cv.CV_8U, 0, 1, ksize = 3) absX = cv.convertScaleAbs(x) absY = cv.convertScaleAbs(y) sobel = cv.addWeighted(absX,0.5,absY,0.5,0) r,binary = cv.threshold(sobel,thre,255,cv.THRESH_BINARY) s = gray.shape element1 = cv.getStructuringElement(cv.MORPH_RECT,(1*2+1,2*2+1)) element2 = cv.getStructuringElement(cv.MORPH_RECT, (min_e*2+1,max_e*2+1)) dilate = cv.dilate(binary,element1,iterations =1) erode = cv.erode(dilate,element2,iterations = 1) dilate = cv.dilate(erode,element1,iterations =2) binary = dilate img,contours,_ = cv.findContours(binary,cv.RETR_EXTERNAL,cv.CHAIN_APPROX_SIMPLE) for contour in contours: area = cv.contourArea(contour) if area > 50000: appCurve = cv.approxPolyDP(contour,10,True) hulls = cv.convexHull(appCurve) i = 0 min_x_y = 9999999 max_x_y = 0 rect_point = [None,(0,0),None,(0,0)] for hull in hulls: point = (hull[0][0],hull[0][1]) x_y = point[0] + point[1] if x_y > max_x_y: max_x_y = x_y rect_point[2] = point if x_y < min_x_y: min_x_y = x_y rect_point[0] = point p1 = (rect_point[2][0],rect_point[0][1]) p2 = (rect_point[0][0],rect_point[2][1]) for hull in hulls: point = (hull[0][0],hull[0][1]) distance11 = abs(p1[0]-point[0]) + abs(p1[1]-point[1]) distance12 = abs(p1[0]-rect_point[1][0]) + abs(p1[1]-rect_point[1][1]) if distance11 < distance12: rect_point[1] = point distance21 = abs(p2[0]-point[0]) + abs(p2[1]-point[1]) distance22 = abs(p2[0]-rect_point[3][0]) + abs(p2[1]-rect_point[3][1]) if distance21 < distance22: rect_point[3] = point M = cv.getPerspectiveTransform(np.array(rect_point,dtype=np.float32),np.array([[0,0],[1440,0],[1440,960],[0,960]],dtype=np.float32)) out = cv.warpPerspective(src,M,(1440,960)) for p in rect_point: cv.circle(src,p,20,(0,0,255),2) #cv.imshow("out",out) out_imgs.append(out) binary = cv.resize(binary,(int(s[1]/3),int(s[0]/3)),cv.INTER_LINEAR) cv.imshow("binary",binary) src = cv.resize(src,(int(s[1]/3),int(s[0]/3)),cv.INTER_LINEAR) cv.imshow("frame",src) key = cv.waitKey(0) if key ==27: for i in range(len(out_imgs)): cv.imwrite("image/"+str(i)+".jpg",out_imgs[i]) break cv.destroyAllWindows()
字提取
字提取的關鍵是找到bbox,思路是通過Canny算子得到輪廓特征,形態學膨脹去除噪聲,找contour,對contour進行面積篩選,滿足閾值擬合出外接矩形,對外接舉行的高度進行閾值判斷,除去噪聲點擬合的小矩形。由此字區域提取完畢。
def process(ROI): while True: thre1 =cv.getTrackbarPos("thre1","Trackbar") thre2 =cv.getTrackbarPos("thre2","Trackbar") max_e = cv.getTrackbarPos("max_e","Trackbar") min_e = cv.getTrackbarPos("min_e","Trackbar") height = cv.getTrackbarPos("height","Trackbar") roi = copy.copy(ROI) gray = cv.cvtColor(roi,cv.COLOR_BGR2GRAY) gaussian = cv.GaussianBlur(gray,(3,3),0,0,cv.BORDER_DEFAULT) median = cv.medianBlur(gaussian,3) edges = cv.Canny(median,thre1,thre2) element = cv.getStructuringElement(cv.MORPH_RECT,(min_e*2+1,max_e*2+1)) dilate = cv.dilate(edges,element,iterations = 1) img,contours,_ = cv.findContours(dilate,cv.RETR_LIST,cv.CHAIN_APPROX_SIMPLE) #cv.drawContours(roi,contours,-1,(0,255,255),2) for contour in contours: area = cv.contourArea(contour) if area > 4: rect = cv.boundingRect(contour) if rect[3]>height: cv.rectangle(roi,(rect[0],rect[1]),(rect[0]+rect[2],rect[1]+rect[3]),(0,255,255),2) cv.imshow("roi",roi) cv.imshow("dilate",dilate) cv.imshow("edges",edges) key = cv.waitKey(10) if key == 27: break cv.destroyAllWindows()
效果
原圖
bbox提取:
還有圖老師說隱私不讓發,就兩張湊個數。
存在問題
- 調參嚴重(不同光線等條件)
- 提取灰度字只是邊界提取,難於辨認
- 多尺度圖片ROI提取需要調參(可以歸結到1)
[機器視覺] 實際場景字提取