Ray - Fast and Simple Distributed Computing

阿新 • • 發佈：2020-10-14

Ray

https://ray.io/

https://github.com/ray-project/ray

（1）機器學習生態基於python語言，但是python具有全域性直譯器鎖缺點，限制了對單臺機器的多核的利用

（2）同時查大規模模型的資料的出現，需要依賴叢集來解決類似問題，引入了分散式機器學習的需求，

但是不需要引入更加高層的應用（spark）的基礎上，ray基於python生態，單程的簡單的分散式計算框架。

ray同時也包括了機器學習應用。

Ray provides a simple, universal API for building distributed applications.

Ray is packaged with the following libraries for accelerating machine learning workloads:

Tune: Scalable Hyperparameter Tuning

RLlib: Scalable Reinforcement Learning

RaySGD: Distributed Training Wrappers

Ray Serve: Scalable and Programmable Serving

https://docs.ray.io/en/latest/index.html

Ray provides a simple, universal API for building distributed applications.

Ray accomplishes this mission by:

Providing simple primitives for building and running distributed applications.

Enabling end users to parallelize single machine code, with little to zero code changes.

Including a large ecosystem of applications, libraries, and tools on top of the core Ray to enable complex applications.

https://www.ctolib.com/topics-138457.html

傳統程式設計依賴於兩個核心概念：函式和類。使用這些構建塊就可以構建出無數的應用程式。

但是，當我們將應用程式遷移到分散式環境時，這些概念通常會發生變化。

一方面，OpenMPI、Python 多程序和 ZeroMQ 等工具提供了用於傳送和接收訊息的低階原語。這些工具非常強大，但它們提供了不同的抽象，因此要使用它們就必須從頭開始重寫單執行緒應用程式。

另一方面，我們也有一些特定領域的工具，例如用於模型訓練的 TensorFlow、用於資料處理且支援 SQL 的 Spark，以及用於流式處理的 Flink。這些工具提供了更高級別的抽象，如神經網路、資料集和流。但是，因為它們與用於序列程式設計的抽象不同，所以要使用它們也必須從頭開始重寫應用程式。

用於分散式計算的工具

Ray 佔據了一個獨特的中間地帶。它並沒有引入新的概念，而是採用了函式和類的概念，並將它們轉換為分散式的任務和 actor。Ray 可以在不做出重大修改的情況下對序列應用程式進行並行化。

來源（論文）

https://arxiv.org/abs/1703.03924

Real-Time Machine Learning: The Missing Pieces

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.

架構

https://www.cnblogs.com/fanzhidongyzby/p/7901139.html

論文給出的架構圖裡並未畫出Driver的概念，因此我在其基礎上做了一些修改和擴充。

Ray的Driver節點和和Slave節點啟動的元件幾乎相同，不過卻有以下區別：

Driver上的工作程序DriverProcess一般只有一個，即使用者啟動的PythonShell。Slave可以根據需要建立多個WorkerProcess。

Driver只能提交任務，卻不能接收來自全域性排程器分配的任務。Slave可以提交任務，也可以接收全域性排程器分配的任務。

Driver可以主動繞過全域性排程器給Slave傳送Actor呼叫任務（此處設計是否合理尚不討論）。Slave只能接收全域性排程器分配的計算任務。

https://zhuanlan.zhihu.com/p/41875076

其中的原理是將程式碼序列化到 redis 上儲存為 object (object 可以理解為高效的不可變物件和資料共享)，實現各種非同步執行和資料交換，優先在本地節點完成任務，如果完不成再由global scheduler 調配到其它節點（更正補充）。

DEMO CODE

單機版本，分散式任務示例。

remote宣告函式為一個任務。

remote呼叫會將任務分發到一個計算程序中，並執行。

import ray
ray.init()

@ray.remote
def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))

聚類學習工作流改造

https://github.com/fanqingsong/machine_learning_workflow_on_ray

from csv import reader
from sklearn.cluster import KMeans
import joblib
import ray


ray.init()


# Load a CSV file
def load_csv(filename):
    file = open(filename, "rt")
    lines = reader(file)
    dataset = list(lines)
    return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())

# Convert string column to integer
def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

def getRawIrisData():
    # Load iris dataset
    filename = 'iris.csv'
    dataset = load_csv(filename)
    print('Loaded data file {0} with {1} rows and {2} columns'.format(filename, len(dataset), len(dataset[0])))
    print(dataset[0])
    # convert string columns to float
    for i in range(4):
        str_column_to_float(dataset, i)
    # convert class column to int
    lookup = str_column_to_int(dataset, 4)
    print(dataset[0])
    print(lookup)

    return dataset

@ray.remote
def getTrainData():
    dataset = getRawIrisData()
    trainData = [ [one[0], one[1], one[2], one[3]] for one in dataset ]

    return trainData

@ray.remote
def getNumClusters():
    return 3

@ray.remote
def train(numClusters, trainData):
    print("numClusters=%d" % numClusters)

    model = KMeans(n_clusters=numClusters)

    model.fit(trainData)

    # save model for prediction
    joblib.dump(model, 'model.kmeans')

    return trainData

@ray.remote
def predict(irisData):
    # test saved prediction
    model = joblib.load('model.kmeans')

    # cluster result
    labels = model.predict(irisData)

    print("cluster result")
    print(labels)


def machine_learning_workflow_pipeline():
    trainData = getTrainData.remote()
    numClusters = getNumClusters.remote()
    trainData = train.remote(numClusters, trainData)
    result = predict.remote(trainData)

    result = ray.get(result)
    print("result=", result)



if __name__ == "__main__":
    machine_learning_workflow_pipeline()

Ray 破冰學習

https://github.com/anyscale/academy/blob/master/ray-crash-course/00-Ray-Crash-Course-Overview.ipynb

Ray - Fast and Simple Distributed Computing

Ray https://ray.io/ https://github.com/ray-project/ray （1）機器學習生態基於python語言，但是python具有全域性直譯器鎖缺點，限制了對單臺機器的多核的利用

PEPSI++: Fast and Lightweight Network for Image Inpainting | 簡記

PEPSI : Fast Image Inpainting with Parallel Decoding Network 一篇不錯的筆記該論文側重講解：PEPSI 網路設計 | CAM | 判別器等改進；

FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising

論文來源：FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising 筆記參考：論文閱讀：FFDNet

JNLP -- the foundametal of distributed computing of Jenkins

Overview https://devopscube.com/jenkins-architecture-explained/ master跟各種型別的worker node建立連線，有兩種過程。

Fauce：Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation 論文解讀（VLDB 2021）

Maglev : A Fast and Reliable Software Network Load Balancer (using Consistent Hashing)

轉自：https://www.evanlin.com/maglev/ 2016 年 6 月 2 日前言（為什麼想讀這一篇論文）

1277C As Simple as One and Two（列舉+剪枝）

題意：給定一個序列，請你對他重新排列使得他有最長的Fib字首。題解：由斐波那契數列的性質可得，一個序列當確定了前兩個數時，這個序列的字首長度就已經確定了。

Online Safe Trajectory Generation For QuadrotorsUsing Fast Marching Method and Bernstein Basis Polyn

線上狀態估計和環境感知，基於Fast Marching在速度場的路徑搜尋，利用Euclidean signed distance field(ESDF)優化軌跡時間分配，使用飛行走廊（flight corridor）來壁障，利用貝賽爾曲線生成有界高階的軌跡

Fast Packet Processing with eBPF and XDP部分

本文整理了讀了Fast Packet Processing with eBPF and XDP中有關eBPF的部分內容後的相關知識

論文筆記3：SegFormer Simple and Efficient Design for Semantic Segmentation with Transformers

論文地址：https://arxiv.org/abs/2105.15203 1 引言文章提出了一種基於transformer的語義分割網路，不同於ViT模型，SegFormer使用一種分層特徵表示的方法，每個transformer層的輸出特徵尺寸逐層遞減，通過這種方式

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

發表時間：2018（NeurIPS 2018）文章要點：這篇文章提出了一個對分類任務檢測異常點的方法。文章首先說異常點可以有兩類，一類是OOD（out-of-distribution），一類是對抗樣本（adversarial samples）。很多方式只

無線網路與移動計算 Wireless Network and Mobile Computing

Overview In this course, we mainly studied three types of wireless networks: 無線個域網 WPAN 無線區域網 WLAN

Tick or Tock? Keeping Time and Order in Distributed Databases

https://en.pingcap.com/blog/Time-in-Distributed-Systems Preface At re:Invent 2017, Amazon Web Services (AWS) announcedAmazon Time Sync Service, a highly accurate and reliable time reference that is n

Understanding and Improving Fast Adversarial Training

目錄概主要內容Random Step的作用線性性質gradient alignment程式碼 Andriushchenko M. and Flammarion N. Understanding and improving fast adversarial training. In Advances in Neural Information Processin