cuML 機器學習加速庫
https://github.com/rapidsai/cudf
https://github.com/cupy/cupy
https://github.com/rapidsai/cuml
nvidia-rapids︱cuML 機器學習加速庫_素質雲筆記 / Recorder...-CSDN 部落格
cuML 是一套用於實現與其他 RAPIDS 專案共享相容 API 的機器學習演算法和數學原語函式。
cuML 使資料科學家、研究人員和軟體工程師能夠在 GPU 上執行傳統的表格 ML 任務,而無需深入瞭解 CUDA 程式設計的細節。 在大多數情況下,cuML 的 Python API 與來自 scikit-learn 的 API 相匹配。
對於大型資料集,這些基於 GPU 的實現可以比其 CPU 等效完成 10-50 倍。 有關效能的詳細資訊,請參閱 cuML 基準測試筆記本。
官方文件:
rapidsai/cuml
cuML API Reference
官方案例還是蠻多的:
來看看有啥模型:
關聯文章:
nvidia-rapids︱cuDF 與 pandas 一樣的 DataFrame 庫
NVIDIA 的 python-GPU 演算法生態 ︱ RAPIDS 0.10
nvidia-rapids︱cuML 機器學習加速庫
nvidia-rapids︱cuGraph(NetworkX-like) 關係圖模型
文章目錄
1 安裝與背景
1.1 安裝
參考:https://github.com/rapidsai/cuml/blob/branch-0.13/BUILD.md
conda env create -n cuml_dev python=3.7 --file=conda/environments/cuml_dev_cuda10.0.yml
docker 版本,可參考:https://rapids.ai/start.html#prerequisites
docker pull rapidsai/rapidsai:cuda10.1-runtime-ubuntu16.04-py3.7 docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \ rapidsai/rapidsai:cuda10.1-runtime-ubuntu16.04-py3.7
1.2 背景
不僅是訓練,要想真正在 GPU 上擴充套件資料科學,也需要加速端到端的應用程式。cuML 0.9 為我們帶來了基於 GPU 的樹模型支援的下一個發展,包括新的森林推理庫(FIL)。FIL 是一個輕量級的 GPU 加速引擎,它對基於樹形模型進行推理,包括梯度增強決策樹和隨機森林。使用單個 V100 GPU 和兩行 Python 程式碼,使用者就可以載入一個已儲存的 XGBoost 或 LightGBM 模型,並對新資料執行推理,速度比雙 20 核 CPU 節點快 36 倍。在開源 Treelite 軟體包的基礎上,下一個版本的 FIL 還將新增對 scikit-learn 和 cuML 隨機森林模型的支援。
圖 3:推理速度對比,XGBoost CPU vs 森林推理庫 (FIL) GPU
2 DBSCAN
The DBSCAN algorithm is a clustering algorithm that works really well for datasets that have regions of high density.
The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames.
import cudf import matplotlib.pyplot as plt import numpy as np from cuml.datasets import make_blobs from cuml.cluster import DBSCAN as cuDBSCAN from sklearn.cluster import DBSCAN as skDBSCAN from sklearn.metrics import adjusted_rand_score %matplotlib inline # 定義引數 n_samples = 10**4 n_features = 2 eps = 0.15 min_samples = 3 random_state = 23 #Generate Data %%time device_data, device_labels = make_blobs(n_samples=n_samples, n_features=n_features, centers=5, cluster_std=0.1, random_state=random_state) device_data = cudf.DataFrame.from_gpu_matrix(device_data) device_labels = cudf.Series(device_labels) # Copy dataset from GPU memory to host memory. # This is done to later compare CPU and GPU results. host_data = device_data.to_pandas() host_labels = device_labels.to_pandas() # sklearn 模型擬合 %%time clustering_sk = skDBSCAN(eps=eps, min_samples=min_samples, algorithm="brute", n_jobs=-1) clustering_sk.fit(host_data) # cuML 模型擬合 %%time clustering_cuml = cuDBSCAN(eps=eps, min_samples=min_samples, verbose=True, max_mbytes_per_batch=13e3) clustering_cuml.fit(device_data, out_dtype="int32") # 視覺化 fig = plt.figure(figsize=(16, 10)) X = np.array(host_data) labels = clustering_cuml.labels_ n_clusters_ = len(labels) # Black removed and is used for noise instead. unique_labels = labels.unique() colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))] for k, col in zip(unique_labels, colors): if k == -1: # Black used for noise. col = [0, 0, 0, 1] class_member_mask = (labels == k) xy = X[class_member_mask] plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col), markersize=5, markeredgecolor=tuple(col)) plt.title('Estimated number of clusters: %d' % n_clusters_) plt.show()
結果評估:
%%time sk_score = adjusted_rand_score(host_labels, clustering_sk.labels_) cuml_score = adjusted_rand_score(host_labels, clustering_cuml.labels_) >>> (0.9998750031236718, 0.9998750031236718)
兩個結果是一模一樣的,也就是 skearn 和 cuML 的結果一致。
3 TSNE 演算法在 Fashion MNIST 的使用
TSNE(T-Distributed Stochastic Neighborhood Embedding) is a fantastic dimensionality reduction algorithm used to visualize large complex datasets including medical scans, neural network weights, gene expressions and much more.
cuML’s TSNE algorithm supports both the faster Barnes Hut $ n logn $ algorithm and also the slower Exact $ n^2 $ .
The model can take array-like objects, either in host as NumPy arrays as well as cuDF DataFrames as the input.
import gzip import matplotlib.pyplot as plt import numpy as np import os from cuml.manifold import TSNE %matplotlib inline # https://github.com/zalandoresearch/fashion-mnist/blob/master/utils/mnist_reader.py def load_mnist_train(path): """Load MNIST data from path""" labels_path = os.path.join(path, 'train-labels-idx1-ubyte.gz') images_path = os.path.join(path, 'train-images-idx3-ubyte.gz') with gzip.open(labels_path, 'rb') as lbpath: labels = np.frombuffer(lbpath.read(), dtype=np.uint8, offset=8) with gzip.open(images_path, 'rb') as imgpath: images = np.frombuffer(imgpath.read(), dtype=np.uint8, offset=16).reshape(len(labels), 784) return images, labels # 載入資料 images, labels = load_mnist_train("data/fashion") plt.figure(figsize=(5,5)) plt.imshow(images[100].reshape((28, 28)), cmap = 'gray')
# 建模 tsne = TSNE(n_components = 2, method = 'barnes_hut', random_state=23) %time embedding = tsne.fit_transform(images) print(embedding[:10], embedding.shape) CPU times: user 2.41 s, sys: 2.57 s, total: 4.98 s Wall time: 4.98 s [[-13.577632 39.87483 ] [ 26.136728 -17.68164 ] [ 23.164072 22.151243 ] [ 28.361032 11.134571 ] [ 35.419216 5.6633983 ] [ -0.15575314 -11.143476 ] [-24.30308 -1.584903 ] [ -5.9438944 -27.522072 ] [ 2.0439444 29.574451 ] [ -3.0801039 27.079374 ]] (60000, 2)
視覺化 Visualize Embedding:
# Visualize Embedding classes = [ 'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot' ] fig, ax = plt.subplots(1, figsize = (14, 10)) plt.scatter(embedding[:,1], embedding[:,0], s = 0.3, c = labels, cmap = 'Spectral') plt.setp(ax, xticks = [], yticks = []) cbar = plt.colorbar(boundaries = np.arange(11)-0.5) cbar.set_ticks(np.arange(10)) cbar.set_ticklabels(classes) plt.title('Fashion MNIST Embedded via TSNE');
4 XGBoosting
import numpy as np; print('numpy Version:', np.__version__) import pandas as pd; print('pandas Version:', pd.__version__) import xgboost as xgb; print('XGBoost Version:', xgb.__version__) # helper function for simulating data def simulate_data(m, n, k=2, numerical=False): if numerical: features = np.random.rand(m, n) else: features = np.random.randint(2, size=(m, n)) labels = np.random.randint(k, size=m) return np.c_[labels, features].astype(np.float32) # helper function for loading data def load_data(filename, n_rows): if n_rows >= 1e9: df = pd.read_csv(filename) else: df = pd.read_csv(filename, nrows=n_rows) return df.values.astype(np.float32) # settings LOAD = False n_rows = int(1e5) n_columns = int(100) n_categories = 2 # 載入資料 %%time if LOAD: dataset = load_data('/tmp', n_rows) else: dataset = simulate_data(n_rows, n_columns, n_categories) print(dataset.shape) # 訓練集切分 # identify shape and indices n_rows, n_columns = dataset.shape train_size = 0.80 train_index = int(n_rows * train_size) # split X, y X, y = dataset[:, 1:], dataset[:, 0] del dataset # split train data X_train, y_train = X[:train_index, :], y[:train_index] # split validation data X_validation, y_validation = X[train_index:, :], y[train_index:] # 檢驗 # check dimensions print('X_train: ', X_train.shape, X_train.dtype, 'y_train: ', y_train.shape, y_train.dtype) print('X_validation', X_validation.shape, X_validation.dtype, 'y_validation: ', y_validation.shape, y_validation.dtype) # check the proportions total = X_train.shape[0] + X_validation.shape[0] print('X_train proportion:', X_train.shape[0] / total) print('X_validation proportion:', X_validation.shape[0] / total) # Convert NumPy data to DMatrix format %%time dtrain = xgb.DMatrix(X_train, label=y_train) dvalidation = xgb.DMatrix(X_validation, label=y_validation) # 設定引數 # instantiate params params = {} # general params general_params = {'silent': 1} params.update(general_params) # booster params n_gpus = 1 booster_params = {} if n_gpus != 0: booster_params['tree_method'] = 'gpu_hist' booster_params['n_gpus'] = n_gpus params.update(booster_params) # learning task params learning_task_params = {'eval_metric': 'auc', 'objective': 'binary:logistic'} params.update(learning_task_params) print(params) # 模型訓練 # model training settings evallist = [(dvalidation, 'validation'), (dtrain, 'train')] num_round = 10 %%time bst = xgb.train(params, dtrain, num_round, evallist)
輸出:
[0] validation-auc:0.504014 train-auc:0.542211 [1] validation-auc:0.506166 train-auc:0.559262 [2] validation-auc:0.501638 train-auc:0.570375 [3] validation-auc:0.50275 train-auc:0.580726 [4] validation-auc:0.503445 train-auc:0.589701 [5] validation-auc:0.503413 train-auc:0.598342 [6] validation-auc:0.504258 train-auc:0.605253 [7] validation-auc:0.503157 train-auc:0.611937 [8] validation-auc:0.502372 train-auc:0.617561 [9] validation-auc:0.501949 train-auc:0.62333 CPU times: user 1.12 s, sys: 195 ms, total: 1.31 s Wall time: 360 ms
相關參考:
5 利用 KNN 進行影象檢索
參考:在 GPU 例項上使用 RAPIDS 加速影象搜尋任務
阿里雲文件中有專門的介紹,所以不做太多贅述。
使用開源框架 Tensorflow 和 Keras 提取圖片特徵,其中模型為基於 ImageNet 資料集的 ResNet50(notop)預訓練模型。
連線公網下載模型(大小約 91M),下載完成後預設儲存到/root/.keras/models/
目錄
資料下載:
import os import tarfile import numpy as np from urllib.request import urlretrieve def download_and_extract(data_dir): """doc""" def _progress(count, block_size, total_size): print('\r>>> Downloading %s (total:%.0fM) %.1f%%' % ( filename, total_size / 1024 / 1024, 100.0 * count * block_size / total_size), end='') url = 'http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz' filename = url.split('/')[-1] filepath = os.path.join(data_dir, filename) decom_dir = os.path.join(data_dir, filename.split('.')[0]) if not os.path.exists(data_dir): os.makedirs(data_dir) if os.path.exists(filepath): print('>>> {} has exist in current directory.'.format(filename)) else: urlretrieve(url, filepath, _progress) print("\nSuccessfully downloaded.") if not os.path.exists(decom_dir): # Decompress print(">>> Decompressing from {}....".format(filepath)) tar = tarfile.open(filepath, 'r') tar.extractall(data_dir) print("Successfully decompressed") tar.close() else: print('>>> Directory "{}" has exist. '.format(decom_dir)) def read_all_images(path_to_data): """get all images from binary path""" with open(path_to_data, 'rb') as f: everything = np.fromfile(f, dtype=np.uint8) images = np.reshape(everything, (-1, 3, 96, 96)) images = np.transpose(images, (0, 3, 2, 1)) return images # the directory to save data data_dir = './data' # download and decompression download_and_extract(data_dir) # 讀入資料 # the path of unlabeled data path_unlabeled = os.path.join(data_dir, 'stl10_binary/unlabeled_X.bin') # get images from binary images = read_all_images(path_unlabeled) print('>>> images shape: ', images.shape) # 看圖 import random import matplotlib.pyplot as plt %matplotlib inline def show_image(image): """show image""" fig = plt.figure(figsize=(3, 3)) plt.imshow(image) plt.show() fig.clear() # random show a image rand_image_index = random.randint(0, images.shape[0]) show_image(images[rand_image_index])
# 分割資料 from sklearn.model_selection import train_test_split train_images, query_images = train_test_split(images, test_size=0.1, random_state=123) print('train_images shape: ', train_images.shape) print('query_images shape: ', query_images.shape) # 圖片特徵 # set tensorflow params to adjust GPU memory usage, if use default params, tensorflow would use # nearly all of the gpu memory, we need reserve some gpu memory for cuml. import os # only use device 0 os.environ["CUDA_VISIBLE_DEVICES"] = "0" import tensorflow as tf from keras.backend.tensorflow_backend import set_session config = tf.ConfigProto() # method 1: allocate gpu memory base on runtime allocations # config.gpu_options.allow_growth = True # method 2: determines the fraction of the onerall amount of memory # that each visibel GPU should be allocated. config.gpu_options.per_process_gpu_memory_fraction = 0.3 set_session(tf.Session(config=config)) # 特徵抽取 from keras.applications.resnet50 import ResNet50 from keras.preprocessing import image from keras.applications.resnet50 import preprocess_input # download resnet50(notop) model(first running) and load model model = ResNet50(weights='imagenet', include_top=False, input_shape=(96, 96, 3), pooling='max') # network summary model.summary() %%time train_features = model.predict(train_images) print('train features shape: ', train_features.shape) %%time query_features = model.predict(query_images) print('query features shape: ', query_features.shape)
然後是 KNN 階段, 包括了 sklear-KNN,和 CUML-KNN:
from cuml.neighbors import NearestNeighbors %%time knn_cuml = NearestNeighbors() knn_cuml.fit(train_features) %%time distances_cuml, indices_cuml = knn_cuml.kneighbors(query_features, k=3) from sklearn.neighbors import NearestNeighbors %%time knn_sk = NearestNeighbors(n_neighbors=3, metric='sqeuclidean', n_jobs=-1) knn_sk.fit(train_features) %%time distances_sk, indices_sk = knn_sk.kneighbors(query_features, 3) # compare the distance obtained while using sklearn and cuml models (np.abs(distances_cuml - distances_sk) < 1).all() # 展示結果 def show_images(query, sim_images, sim_dists): """doc""" simi_num = len(sim_images) fig = plt.figure(figsize=(3 * (simi_num + 1), 3)) axes = fig.subplots(1, simi_num + 1) for index, ax in enumerate(axes): if index == 0: ax.imshow(query) ax.set_title('query') else: ax.imshow(sim_images[index - 1]) ax.set_title('dist: %.1f' % (sim_dists[index - 1])) plt.show() fig.clear() # get random indices random_show_index = np.random.randint(0, query_images.shape[0], size=5) random_query = query_images[random_show_index] random_indices = indices_cuml[random_show_index].astype(np.int) random_distances = distances_cuml[random_show_index] # show result images for query_image, sim_indices, sim_dists in zip(random_query, random_indices, random_distances): sim_images = train_images[sim_indices] show_images(query_image, sim_images, sim_dists)
用到後再追加..
全文完
本文由簡悅 SimpRead優化,用以提升閱讀體驗 使用了全新的簡悅詞法分析引擎beta,點選檢視詳細說明文章目錄1 安裝與背景1.1 安裝1.2 背景2 DBSCAN3 TSNE 演算法在 Fashion MNIST 的使用4 XGBoosting5 利用 KNN 進行影象檢索