1. 程式人生 > >Python入門:PIL之驗證碼破解

Python入門:PIL之驗證碼破解

環境介紹

1、當前檔案的路徑是:/Users/frankslg/PycharmProjects/cjb/ver/ver_code1.py
2、而存放圖片的路徑是:/Users/frankslg/PycharmProjects/cjb/img/*.jpeg
3、os.getcwd()
Out[3]: ‘/Users/frankslg/PycharmProjects/cjb’

程式碼實現

#ver_code1.py

from PIL import Image
import pytesseract
import os

def convert(pic_path,pic):
    #先將圖片進行灰度處理,也就是處理成單色,然後進行下一步單色對比
imgrey = pic.convert('L') #去除圖片噪點,170是經過多次調整後,去除噪點的最佳值 ''' 其實就是對已處理的灰度圖片,中被認為可能形成驗證碼字元的畫素進行閥值設定, 如果閥值等於170,我就認為是形成驗證碼字串的所需畫素,然後將其新增進一個空table中, 最後通過im.point將使用table拼成一個新驗證碼圖片 ''' threshold = 170 table = [] for i in range(256): if i < threshold: table.append(0
) else: table.append(1) #使用table(是上面生成好的)生成圖片 out = imgrey.point(table,'1') out.save(pic_path + '/' + 'cjb'+ str(threshold) + '.jpeg','jpeg') #讀取處理好的圖片的路徑 a = pic_path + '/' + 'cjb' + str(threshold) + '.jpeg' img3 = Image.open(a,'r') #將圖片中的畫素點識別成字串(圖片中的畫素點如果沒有處理好,
#可能在識別過程中會有誤差,如多個字元少個字元,或者識別錯誤等) vcode = pytesseract.image_to_string(img3) print(vcode)#此句也是測試結果時使用的 return vcode#此句才是將被破解的驗證碼字串返回給需要的程式碼的 if __name__ == '__main__': pic_path = (os.getcwd()[:-4])+ '/img'#先獲取圖片的儲存路徑 pic = pic_path + '/' + os.listdir(pic_path)[0]#找到對應的圖片,此處的0是指, #找圖片目錄中第一個圖片,你可以根據自己的需要進行修改 pic_open = Image.open(pic,'r') convert(pic_path,pic_open)

執行效果

原圖:
這裡寫圖片描述
灰度圖:
這裡寫圖片描述
清除噪點後的圖:
這裡寫圖片描述
注:這裡要說明一點,清除噪點後的圖是白底黑字,還是黑底白字就看噪點處理程式碼中大於噪點時使用的是1還是0
程式碼執行後的結果:
WDHA

參考資料

In[18]: help(Image.open(pic,’r’).convert)

Help on method convert in module PIL.Image:

convert(mode=None, matrix=None, dither=None, palette=0, colors=256) method of PIL.JpegImagePlugin.JpegImageFile instance
Returns a converted copy of this image. For the “P” mode, this
method translates pixels through the palette. If mode is
omitted, a mode is chosen so that all information in the image
and the palette can be represented without a palette.

The current version supports all possible conversions between
"L", "RGB" and "CMYK." The **matrix** argument only supports "L"
and "RGB".

When translating a color image to black and white (mode "L"),
the library uses the ITU-R 601-2 luma transform::

    L = R * 299/1000 + G * 587/1000 + B * 114/1000

The default method of converting a greyscale ("L") or "RGB"
image into a bilevel (mode "1") image uses Floyd-Steinberg
dither to approximate the original image luminosity levels. If
dither is NONE, all non-zero values are set to 255 (white). To
use other thresholds, use the :py:meth:`~PIL.Image.Image.point`
method.

:param mode: The requested mode. See: :ref:`concept-modes`.
:param matrix: An optional conversion matrix.  If given, this
   should be 4- or 12-tuple containing floating point values.
:param dither: Dithering method, used when converting from
   mode "RGB" to "P" or from "RGB" or "L" to "1".
   Available methods are NONE or FLOYDSTEINBERG (default).
:param palette: Palette to use when converting from mode "RGB"
   to "P".  Available palettes are WEB or ADAPTIVE.
:param colors: Number of colors to use for the ADAPTIVE palette.
   Defaults to 256.
:rtype: :py:class:`~PIL.Image.Image`
:returns: An :py:class:`~PIL.Image.Image` object.

In[10]: help(im.point)

Help on method point in module PIL.Image:

point(lut, mode=None) method of PIL.JpegImagePlugin.JpegImageFile instance
Maps this image through a lookup table or function.

:param lut: A lookup table, containing 256 (or 65336 if
   self.mode=="I" and mode == "L") values per band in the
   image.  A function can be used instead, it should take a
   single argument. The function is called once for each
   possible pixel value, and the resulting table is applied to
   all bands of the image.
:param mode: Output mode (default is same as input).  In the
   current version, this can only be used if the source image
   has mode "L" or "P", and the output has mode "1" or the
   source image mode is "I" and the output mode is "L".
:returns: An :py:class:`~PIL.Image.Image` object.

In[16]: help(pytesseract.image_to_string)

Help on function image_to_string in module pytesseract.pytesseract:

image_to_string(image, lang=None, boxes=False, config=None)
Runs tesseract on the specified image. First, the image is written to disk,
and then the tesseract command is run on the image. Resseract’s result is
read, and the temporary files are erased.

also supports boxes and config.

if boxes=True
    "batch.nochop makebox" gets added to the tesseract call
if config is set, the config gets appended to the command.
    ex: config="-psm 6"