Google線上深度學習神器Colab

阿新 • • 發佈：2020-12-24

Colab是google最近推出的一項Python線上程式設計的免費服務, 有了它,不學Python程式設計的理由又少了一個

Colab環境已經集成了流行的深度學習框架Tensorflow,並附贈了一個虛擬機器(40GB硬碟+2*2.30GHZCPU+12.72GB記憶體),如果在國內無法訪問google的服務又不想科學Shang網, 可以考慮微軟推出的notebook

Colab的操作類似於jupyter notebook

Colab如同使用 Google 文件或表格一樣儲存在Google雲端硬碟中，並且可以共享

1. Colab 執行終端命令

google為我們提供的Colab服務繫結一個Ubuntu虛擬機器(40GB硬碟+2*2.30GHZ CPU+12.72GB記憶體), 我們只要在Colab中輸入以!

開頭的終端命令即可

檢視虛擬機器硬碟容量!df -lh

40GB的硬碟

檢視cpu配置!cat /proc/cpuinfo | grep model\ name

雙核處理器

檢視記憶體容量!cat /proc/meminfo | grep MemTotal

12.72GB記憶體

安裝python依賴包

# 安裝requests, 爬蟲必備
!pip install requests
# 安裝 lxml, 解析xpath語法
!pip install lxml

安裝 git

# 將獲取的資料同步到github倉庫
!apt install git

2. 用Colab編寫線上爬蟲,並在線展示成果

線上編寫豆瓣電影爬蟲

!pip install lxml
import os
import requests
from lxml import etree

# 負責下載電影海報
def download_img(db_id, title, img_addr, headers):

    # 如果不存在圖片資料夾,則自動建立
    if os.path.exists("./Top250_movie_images/"):
        pass
    else:
        os.makedirs("./Top250_movie_images/")

    # 獲取圖片二進位制資料
    image_data = requests.get(img_addr, headers=headers).content
    # 設定海報存儲存的路徑和名稱
    image_path = "./Top250_movie_images/" + db_id[0] + "_" + title[0] + '.jpg'
    # 儲存海報圖片
    with open(image_path, "wb+") as f:
        f.write(image_data)



# 根據url獲取資料,並列印到螢幕上,並儲存為檔案
def get_movies_data(url, headers):

    # 獲取頁面的響應內容
    db_response = requests.get(url, headers=headers)

    # 將獲得的原始碼轉換為etree
    db_reponse_etree = etree.HTML(db_response.content)

    # 提取所有電影資料
    db_movie_items = db_reponse_etree.xpath('//*[@id="content"]/div/div[1]/ol/li/div[@class="item"]')

    # 遍歷電影資料列表, 
    for db_movie_item in db_movie_items:

        # 這裡用到了xpath的知識
        db_id = db_movie_item.xpath('div[@class="pic"]/em/text()') 
        db_title = db_movie_item.xpath('div[@class="info"]/div[@class="hd"]/a/span[1]/text()')
        db_score = db_movie_item.xpath('div[@class="info"]/div[@class="bd"]/div[@class="star"]/span[@class="rating_num"]/text()')
        db_desc = db_movie_item.xpath('div[@class="info"]/div[@class="bd"]/p[@class="quote"]/span[@class="inq"]/text()')
        db_img_addr = db_movie_item.xpath('div[@class="pic"]/a/img/@src')
        print("編號:",db_id,"標題:",db_title, "評分:",db_score,"電影描述:", db_desc)
        # a表示追加模式, b表示以二進位制方式寫入, + 表示如果檔案不存在則自動建立
        with open("./douban_movie_top250.txt", "ab+") as f:
            tmp_data = "編號:"+str(db_id)+"標題:"+str(db_title)+"評分:"+str(db_score)+"電影描述:"+ str(db_desc)+"\n"
            f.write(tmp_data.encode("utf-8"))

        db_img_addr = str(db_img_addr[0].replace("\'", ""))
        download_img(db_id, db_title, db_img_addr, headers)


def main():
    # 使用列表生成式,生成待爬取的頁面url的列表
    urls = ["https://movie.douban.com/top250?start="+str(i*25) for i in range(10)]

    # 設定請求頭
    headers = {
        # 設定使用者代理頭(為狼披上羊皮)
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
    }

    # 為避免重複執行程式,造成內容重複,這裡把上次的檔案清除(可跳過)
    if os.path.isfile("./douban_movie_top250.txt"):
        os.remove("./douban_movie_top250.txt")

    # 從列表取出url進行爬取
    for url in urls:
        get_movies_data(url, headers)

if __name__ == '__main__':
    main()

展示圖片

import os
from IPython.display import display, Image, FileLink
names = [f for f in os.listdir('./Top250_movie_images/')]
display(FileLink("./douban_movie_top250.txt"))
for name in names:
   display(Image('./Top250_movie_images/' + name))

3.線上機器學習,決策樹案例 -泰坦尼克乘客存活狀況

機器學習決策樹案例

4. 線上學習Python程式設計

推薦一:菜鳥教程
用菜鳥的心態學習

推薦二:廖雪峰的官方網站
廖雪峰

開啟網頁學程式設計

5.儲存當前Colab檔案

Colab檔案和Google的線上文件一個性質,不需要儲存!

6. 將當前的Colab轉換為python標準檔案,並儲存到本地

儲存到py

7. 共享Colab程式

Colab資源可以以連結方式共享給其他人, 其他人可以直接線上執行, 觀看效果

共享Colab程式.png

小技巧:

如何獲取線上環境的公網地址:Python3獲取本機公網ip(爬蟲法)

如何與線上環境進行檔案互傳: 通過Github倉庫進行資料同步是不錯的選擇!

Google線上深度學習神器Colab

Colab是google最近推出的一項Python線上程式設計的免費服務, 有了它,不學Python程式設計的理由又少了一個

免費深度學習GPU，Google Yes！

深度學習越加火熱，但是，很多實驗室並沒有配套的硬體裝置，讓貧窮的學生黨頭大

tensorflow實戰google深度學習框架學習筆記（第三章）

技術標籤：Tensorflow筆記tensorflow機器學習python tensorflow實戰一、Tensorflow計算模型——計算圖

深度學習，機器學習神器，白嫖免費GPU

深度學習，機器學習神器，白嫖免費GPU！最近在學習計算機視覺，自己的小本本沒有那麼高的算力，層級嘗試過Google的Colab，以及移動雲的GPU算力，都不算理想。如果資料集比較小，可以試試Colab，但是如果資料集很大

實戰Google深度學習框架：TensorFlow計算加速

要將深度學習應用到實際問題中，一個非常大的問題在於訓練深度學習模型需要的計算量太大。比如Inception-v3模型在單機上訓練到78%的正確率需要將近半年的時間，這樣的訓練速度是完全無法應用到實際生產中的。為了加

java應用監測(7)-線上動態診斷神器BTrace

tags: java,troubleshooting,monitor,btrace 一句話概括：BTrace是一個是強大的java線上應用檢測工具（動態追蹤工具），可以在不修改應用程式碼，不停應用服務的前提下檢測程式碼執行情況，進而診斷問題，是生產環

深度學習入門之Pytorch 資料增強的實現

資料增強卷積神經網路非常容易出現過擬合的問題，而資料增強的方法是對抗過擬合問題的一個重要方法。

人工智慧深度學習入門練習之（23）TensorFlow – 高階API

前面章節都是低階API的介紹，有助於我們理解TensorFlow的基礎知識。為方便開發人員，TensorFlow提供了高階API，包括以下模組：

Ubuntu k80深度學習環境搭建

英偉達驅動安裝英偉達驅動下載：https://www.nvidia.cn/Download/driverResults.aspx/135493/cn/

深度學習“四大名著”釋出！Python、TensorFlow、機器學習、深度學習四件套！

Python 程式設計師深度學習的“四大名著”：這四本書著實很不錯！我們都知道現在機器學習、深度學習的資料太多了，面對海量資源，往往陷入到“無從下手”的困惑出境。而且並非所有的書籍都是優質資源，浪費大量的時

深度學習論文翻譯解析（八）：Rich feature hierarchies for accurate object detection and semantic segmentation

論文標題：Rich feature hierarchies for accurate object detection and semantic segmentation 　　標題翻譯：豐富的特徵層次結構，可實現準確的目標檢測和語義分割

深度學習論文翻譯解析（九）：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

論文標題：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition　　　　　　標題翻譯：用於視覺識別的深度卷積神經網路中的空間金字塔池

深度學習面試題35：RNN梯度消失問題(vanishing gradient)

目錄　　梯度消失原因之一：啟用函式　　梯度消失原因之二：初始化權重　　不同損失函式下RNN的梯度消失程度對比

PyTorch 深度學習實踐 - 基礎篇

反向傳播 In PyTorch, Tensor is the important component in constructing dynamic computational graph. It contains data and grad, which storage the value of node and gradient w.r.t (with respect to) loss