1. 程式人生 > 其它 >通過Appium+mitmproxy爬取快手短視訊,並將爬取資訊存入mongodb資料庫

通過Appium+mitmproxy爬取快手短視訊,並將爬取資訊存入mongodb資料庫

一、目標

通過appium模擬滑動快手發現頁中的視訊,通過mitmproxy代理抓取視訊資訊,存入mongodb資料庫中

二、步驟

  • 通過fiddler抓包工具分析介面

    • 先通過fiddler抓包工具,分析並獲取到視訊資料的api,發現視訊資訊api中包含“rest/n/feed/hot”。然後將返回的json資料通過json.cn網站進行開啟分析,解析想要的資料,並編寫mitmproxy的腳步,存入mongodb資料庫。
  • 編寫mitmproxy指令碼

    # coding:utf-8
    import pymongo
    import json
    
    #mongodb資料庫連線
    client = pymongo.MongoClient(host='
    主機名',port=27017) db = client['kuaishou'] collection = db['video_info'] def response(flow): if 'rest/n/feed/hot' in flow.request.url: info_dict = json.loads(flow.response.text) infos = info_dict.get('feeds') for info in infos: video_info = {} video_info[
    'user_id'] = info['user_id'] #使用者id video_info['user_name'] = info['user_name'].strip()#使用者名稱 video_info['title'] = info['caption'] #標題 video_info['video_url'] = info['main_mv_urls'][0]['url'] #視訊地址 video_info['duration'] = int(info['duration']/1000) #視訊時長 video_info['
    view_count'] = info['view_count'] #觀看數 video_info['share_count'] = info['share_count'] #分享數 video_info['comment_count'] = info['comment_count'] #評論數 video_info['like_count'] = info['like_count'] #喜歡數 video_info['unlike_count'] = info['unlike_count'] #不喜歡數 video_info['share_info'] = info['share_info'] #分享資訊 collection.replace_one({'video_url':video_info['video_url']},video_info,True) #存入資料庫,有則替換,沒有則插入
  • 編寫爬蟲指令碼

    # coding:utf-8
    import time
    from appium.webdriver import Remote
    from selenium.webdriver.support.ui import WebDriverWait as WAIT
    
    # desired_capabilities
    cap = {
      "platformName": "Android",
      "platformVersion": "5.1.1",
      "deviceName": "127.0.0.1:62001",
      "appPackage": "com.smile.gifmaker",
      "appActivity": "com.yxcorp.gifshow.HomeActivity",
      "noReset": True,
      "unicodeKeyboard": True,
      "keyboardReset": True
    }
    
    def get_size(driver):
        '''獲取頁面視窗大小'''
        size = driver.get_window_size()
        return size['width'],size['height']
    
    
    driver = Remote('http://127.0.0.1:4723/wd/hub',desired_capabilities=cap) #appium客戶端連線
    
    
    #點選彈窗---青少年模式下的我知道了
    try:
        i_know = WAIT(driver, 400).until(lambda x:x.find_element_by_android_uiautomator('new UiSelector().className(\"android.widget.TextView\").textContains(\"我知道了\").resourceId(\"com.smile.gifmaker:id/positive\")'))
        i_know.click()
    except:
        pass
    
    
    time.sleep(2)
    size = get_size(driver) #獲取快手介面大小
    
    #滑動的起止位置,從中間下方80%的位置滑動到上方20%的位置
    x = int(size[0]*0.5)
    y_start = int(size[1]*0.8)
    y_end = int(size[1]*0.2)
    
    #模擬滑動20次
    for i in range(20):
        driver.swipe(x,y_start,x,y_end,200) #滑動時間200 ms
        time.sleep(1)
  • 寫個腳步下載視訊

    # coding:utf-8
    import requests
    import pymongo
    import os
    import time
    import re
    
    headers = {
        'UserAgent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    client = pymongo.MongoClient(host='主機名',port=27017)
    db = client['kuaishou'] #指定資料庫名
    collection = db['video_info'] #指定集合名
    
    #視訊儲存位置
    if not os.path.exists('./videos'):
        os.mkdir('./videos')
    
    video_infos = collection.find({}) #返回的是一個iterator
    
    for video_info in video_infos:
        video_url = video_info['video_url']
        video_name = re.search(r'clientCacheKey=(.*?\.mp4)',video_url).group(1)
        data = requests.get(video_url,headers=headers).content
        with open('./videos/'+video_name,'wb') as f:
            f.write(data)
        time.sleep(1)