python爬蟲---字型反爬

阿新 • • 發佈：2022-04-12

目標地址：http://glidedsky.com/level/web/crawler-font-puzzle-1

開啟google除錯工具檢查發現網頁上和原始碼之中的數字不一樣, 已經確認該題目為字型反扒直接進入正題：

獲取字型檔案：

1丶直接找到數字節點屬性：style 的 font-family 的值：glided_sky，在原始碼中找到引入的的字型檔案並儲存下來到本地

2丶該字型檔案通過base64編碼儲存的直接請求將編碼的值和節點中的數字內容獲取到 (獲取的方式自己選擇合適的即可，本文使用 pyquery 模組進行操作 )

import base64
import requests
from pyquery import PyQuery as pyq

response = requests.get(f'http://glidedsky.com/level/web/crawler-font-puzzle-1?page=1', headers=headers, cookies=cookies,verify=False)
doc = pyq(response.text)
cts = doc('style')
base_info = ''.join([ pyq(i).text().split('base64,')[1].split(')')[0] for i in cts])
cts = doc('.col-md-1')
num_list = ([pyq(i).text() for i in cts])
print(f' num_list {num_list}')

3丶將獲取得到的base64值儲存為本地 .ttf 並將原始碼中的數值進行匹配得到網頁上展示的真正值

將儲存的字型檔案使用字型編輯器開啟並手動確認編碼和數字之間的對應關係

具體實現程式碼如下

# 字型轉換
from fontTools.ttLib import TTFont

def font_switch(base_info, number_info):
    page_one = base_info
    b=base64.b64decode(page_one)
    with open('new_page.ttf','wb')as f:
        f.write(b)
    font=TTFont('main.ttf')    # 提前儲存的一份本地檔案 開啟本地字型檔案local.ttf
    # font.saveXML('main.xml')   #將ttf檔案轉化成xml格式並儲存到本地，主要是方便我們檢視內部資料結構
    obj_list1=font.getGlyphNames()[1:]   #獲取所有字元的物件，去除第一個
    uni_list1=font.getGlyphOrder()[1:] #獲取所有編碼，去除前1個
    print(f' uni_list1  {uni_list1}')

    # 手動確認編碼和數字之間的對應關係，儲存到字典中
    dict={
        'seven':6,
        'six':8,
        'four':0,
        'eight':5,
        'two':1,
        'five':4,
        'one':9,
        'zero':7,
        'nine':2,
        'three':3,
    }

    # 網頁新下載的
    font2=TTFont('new_page.ttf')  # 開啟新下載的字型檔案
    obj_list2=font2.getGlyphNames()[1:]   #獲取所有字元的物件，去除第一個
    uni_list2=font2.getGlyphOrder()[1:] #獲取所有編碼，去除前1個
    new_dict= {}
    for uni2 in uni_list2:
        print(f'uni2 : {uni2}')
        obj2=font2['glyf'][uni2]  #獲取編碼uni2在new_page.ttf中對應的物件
        for uni1 in uni_list1:
            obj1=font['glyf'][uni1]
            if obj1==obj2:
                new_dict[f'{uni2}'] = dict[uni1]
    # 得到字型轉化後的真正值
    print(f' new_dict  {new_dict}')

    #TODO 將傳進來的數字轉化
    number = number_info
    # 列表
    lists = [
        'zero',
        'one',
        'two',
        'three',
        'four',
        'five',
        'six',
        'seven',
        'eight',
        'nine',
    ]
    new_number = [int(''.join([str(new_dict[lists[int(n)]]) for n in num])) for num in number]
    return sum(new_number)

將獲取的值傳入這個方法就能獲取得到網頁上的正確數值了

！！！搞定！！！將剩下的 999 頁的值統計出來求和就得到了正確答案

python爬蟲---字型反爬

目標地址：http://glidedsky.com/level/web/crawler-font-puzzle-1 開啟google除錯工具檢查發現網頁上和原始碼之中的數字不一樣, 已經確認該題目為字型反扒直接進入正題：

python爬蟲 - js逆向之woff字型反爬破解

本篇博文的主題就是處理字型反爬的，其實這種網上已經很多了，那為什麼我還要寫呢？因為無聊啊，最近是真沒啥事，並且我看了下，還是有點難度的，然後這個字型反爬系列會出兩到三篇博文，針對市面上主流的字型反爬

python爬蟲 - js逆向之svg字型反爬破解

前言同樣的，接上一篇 python爬蟲 - js逆向之woff字型反爬破解，而且也是同一個站的資料，只是是不同的反爬

Python3 爬蟲-自定義字型反爬

百度字型編輯器：http://fontstore.baidu.com/static/editor/index.html 使用一種自定義的字型格式，新建ttf檔案，通過https://cloudconvert.com/ttf-to-svg網站把ttf檔案轉換為svg檔案，然後把svg檔案上傳http://