How to Distribute Keras Training with PowerAI DDL
The 1.5.3 release of PowerAI includes updates to IBM’s Distributed Deep Learning (DDL) framework which facilitate the distribution of Tensorflow Keras training. In this article we will walk through the process of taking an existing Tensorflow Keras model, making the code changes necessary to distribute its training using DDL and using ddlrun
The script we used as the starting point is the keras mnist_cnn.py example script.
Code Changes
1. Imports
The first step is to convert any keras
imports to tensorflow.keras
imports. This is accomplished by replacing import keras
with from tensorflow.python import keras as keras
from keras.xxxxx import ...
with imports of the form from tensorflow.python.keras.xxxxx import ...
. We also have to import ddl
and numpy as np
. Importing ddl
automatically distributes the gradient computation during training.
import keras | from tensorflow.pythonimport keras as keras from keras.datasets import mnist | from tensorflow.python.keras.datasets import mnist from keras.models import Sequential | from tensorflow.python.keras.models import Sequential from keras.layers import Dense, Dropout, Flatten | from tensorflow.python.keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D | from tensorflow.python.keras.layers import Conv2D, MaxPooling2D from keras import backend as K | from tensorflow.python.keras import backend as K > import ddl > import numpy as np
2. Split the Training and Test Data
Next we have to split the training and test data so that each gpu is working on different data. This split is what is actually splitting up the work for ddl
.
x_test_full
andy_test_full
are added to be able to do a final model evaluation at the end.np.array_split(x_train, ddl.size())[ddl.rank()]
is used to split the training data intoddl.size()
pieces and select the piece that corresponds to the current rank,ddl.rank()
. The same is done for all training and test data and labels.
> # DDL: Save the full test data before splitting for final accuracy check. > x_test_full = x_test.astype('float32') / 255 > y_test_full = keras.utils.to_categorical(y_test, num_classes) > > # DDL: Split the training & testing data. > x_train = np.array_split(x_train, ddl.size())[ddl.rank()] > x_test = np.array_split(x_test, ddl.size())[ddl.rank()] x_train = x_train.astype('float32') x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_train /= 255 x_test /= 255 x_test /= 255 print('x_train shape:', x_train.shape) print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') print(x_test.shape[0], 'test samples') > # DDL: Split the training & testing data. > y_train = np.array_split(y_train, ddl.size())[ddl.rank()] > y_test = np.array_split(y_test, ddl.size())[ddl.rank()]
3. Adjust the Learning Rate
The next change we have to make is to multiply the learning rate by the total number of GPUs. The intuition behind this is as follows. Since we are splitting up the data and performing gradient descent across ddl.size()
GPUs, each with a batch size of 128, we end up with an effective global batch size of 128 * ddl.size()
. This has the result of reducing the number of gradient descent updates that occur each epoch, slowing the convergence rate by a factor of approximately the number of GPUs. To compensate for this we must scale the learning rate by ddl.size()
.
> # DDL: adjust learning rate based on number of GPUs. model.compile(loss=keras.losses.categorical_crossentropy, model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), | optimizer=keras.optimizers.Adadelta(lr=1.0 * ddl.size()), metrics=['accuracy']) metrics=['accuracy'])
4. Add DDL Callbacks
DDL requires that 2 callbacks be added to the list of Keras callbacks. To ensure that metrics used for early stopping and other hyper parameter tuning remain in sync throughout training, we have to add ddl.DDLCallback()
as the first callback in the callback list. To ensure that all global variables in the model are correctly initialized we have to add ddl.DDLGlobalVariablesCallback()
as the last callback in the callback list.
> callbacks = list() > > # DDL: Add the DDL callback. > callbacks.append(ddl.DDLCallback()) > callbacks.append(ddl.DDLGlobalVariablesCallback())
5. Restrict Printing to Rank 0
There are usually some operations that we only want to perform on a single node, printing for example. To restrict certain operations to rank 0 we can use if ddl.rank() == 0
. Here we also use x_test_full
and y_test_full
to perform an evaluation of the model for the final accuracy check to display at the end.
> # DDL: Only use verbose = 1 on rank 0. model.fit(x_train, y_train, model.fit(x_train, y_train, batch_size=batch_size, batch_size=batch_size, epochs=epochs, epochs=epochs, verbose=1, | verbose=1 if ddl.rank() == 0 else 0, validation_data=(x_test, y_test)) | validation_data=(x_test, y_test), > callbacks=callbacks) > # DDL: Only do final accuracy check on rank 0. > if ddl.rank() == 0: score = model.evaluate(x_test, y_test, verbose=0) | score = model.evaluate(x_test_full, y_test_full, verbose=0) print('Test loss:', score[0]) | print('Test loss:', score[0]) print('Test accuracy:', score[1]) | print('Test accuracy:', score[1])
Running the Script
To run the script across any number of nodes, we can use the following commands:
$ source /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-activate $ /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-install-samples ~/samples $ ddlrun -H host1,host2,host3,host4,... python ~/samples/examples/keras/mnist-tf-keras.py
On 4 GPUs the output looks like:
$ ddlrun python ~/samples/examples/keras/mnist-tf-keras.py + mpirun -x PATH -x LD_LIBRARY_PATH -x PYTHONPATH -tcp -disable_gpu_hooks --rankfile /tmp/DDLRUN/ddlrun.Rd3PDdkJYvRb/RANKFILE -x 'DDL_OPTIONS=-mode p:4x1x1x1 ' -n 4 python ~/samples/examples/keras/mnist-tf-keras.py DDL: DDL_GROUP_SIZE=10000000. 2018-08-28 19:37:57.689450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0035:03:00.0 totalMemory: 15.75GiB freeMemory: 15.34GiB 2018-08-28 19:37:57.689548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 2 2018-08-28 19:37:57.689856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0004:05:00.0 totalMemory: 15.75GiB freeMemory: 15.34GiB 2018-08-28 19:37:57.689948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 1 2018-08-28 19:37:57.691164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0004:04:00.0 totalMemory: 15.75GiB freeMemory: 15.34GiB 2018-08-28 19:37:57.691221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-08-28 19:37:57.726137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0035:04:00.0 totalMemory: 15.75GiB freeMemory: 15.34GiB 2018-08-28 19:37:57.726350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 3 2018-08-28 19:37:58.078092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:37:58.078179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-08-28 19:37:58.078203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-08-28 19:37:58.078863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0) 2018-08-28 19:37:58.080687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:37:58.080722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1 2018-08-28 19:37:58.080738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: N 2018-08-28 19:37:58.081261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14849 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0) 2018-08-28 19:37:58.150432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:37:58.150481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 2 2018-08-28 19:37:58.150495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N 2018-08-28 19:37:58.151084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0) 2018-08-28 19:37:58.405935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:37:58.406037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 3 2018-08-28 19:37:58.406065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N 2018-08-28 19:37:58.406917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0) I 19:37:58.441 122001 122471 DDL:41 ] [MPI:0 ] ==== IBM Corp. DDL 1.1.0 + (MPI 3.1) ==== 2018-08-28 19:37:59.918249: I ddl_MDR_ops.cc:826] [MPI:2 ] name=Init local_gpuid=2 local_rank=2 local_size=4 2018-08-28 19:37:59.918246: I ddl_MDR_ops.cc:826] [MPI:3 ] name=Init local_gpuid=3 local_rank=3 local_size=4 2018-08-28 19:37:59.918266: I ddl_MDR_ops.cc:826] [MPI:1 ] name=Init local_gpuid=1 local_rank=1 local_size=4 2018-08-28 19:37:59.918266: I ddl_MDR_ops.cc:826] [MPI:0 ] name=Init local_gpuid=0 local_rank=0 local_size=4 DDL: rank: 0, size: 4, gpuid: 0, hosts: 1 DDL: rank: 1, size: 4, gpuid: 1, hosts: 1 DDL: rank: 2, size: 4, gpuid: 2, hosts: 1 DDL: rank: 3, size: 4, gpuid: 3, hosts: 1 x_train shape: (15000, 28, 28, 1) 15000 train samples 2500 test samples x_train shape: (15000, 28, 28, 1) 15000 train samples 2500 test samples x_train shape: (15000, 28, 28, 1) 15000 train samples 2500 test samples x_train shape: (15000, 28, 28, 1) 15000 train samples 2500 test samples 2018-08-28 19:38:00.963727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 2 2018-08-28 19:38:00.963824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:38:00.963838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 2 2018-08-28 19:38:00.963851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N 2018-08-28 19:38:00.964433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0) Train on 15000 samples, validate on 2500 samples 2018-08-28 19:38:01.026421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-08-28 19:38:01.026512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:38:01.026535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-08-28 19:38:01.026565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-08-28 19:38:01.027149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0) 2018-08-28 19:38:01.245015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 1 2018-08-28 19:38:01.245136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:38:01.245160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1 2018-08-28 19:38:01.245180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: N 2018-08-28 19:38:01.245765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14849 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0) 2018-08-28 19:38:02.113830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 3 2018-08-28 19:38:02.113934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-28 19:38:02.113949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 3 2018-08-28 19:38:02.113963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N 2018-08-28 19:38:02.114639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0) Epoch 1/12 2018-08-28 19:38:04.161660: I ddl_MDR_ops.cc:357] [MPI:0 ] name=training/Adadelta/AllReduceN _global_buf_size=1199882 _N=8 I 19:38:04.529 122001 122637 DDL:703 ] [MPI:0 ] selected algo: NCCLB - NCCLB 15000/15000 [==============================] - 4s 282us/step - loss: 0.4831 - acc: 0.8497 - val_loss: 0.1190 - val_acc: 0.9596 Epoch 2/12 15000/15000 [==============================] - 2s 119us/step - loss: 0.1169 - acc: 0.9679 - val_loss: 0.0846 - val_acc: 0.9700 Epoch 3/12 15000/15000 [==============================] - 2s 133us/step - loss: 0.0805 - acc: 0.9760 - val_loss: 0.0731 - val_acc: 0.9728 Epoch 4/12 15000/15000 [==============================] - 2s 118us/step - loss: 0.0693 - acc: 0.9797 - val_loss: 0.0571 - val_acc: 0.9792 Epoch 5/12 15000/15000 [==============================] - 2s 122us/step - loss: 0.0514 - acc: 0.9843 - val_loss: 0.0443 - val_acc: 0.9832 Epoch 6/12 15000/15000 [==============================] - 2s 120us/step - loss: 0.0473 - acc: 0.9868 - val_loss: 0.0539 - val_acc: 0.9804 Epoch 7/12 15000/15000 [==============================] - 2s 120us/step - loss: 0.0408 - acc: 0.9869 - val_loss: 0.0510 - val_acc: 0.9844 Epoch 8/12 15000/15000 [==============================] - 2s 121us/step - loss: 0.0398 - acc: 0.9877 - val_loss: 0.0579 - val_acc: 0.9836 Epoch 9/12 15000/15000 [==============================] - 2s 122us/step - loss: 0.0373 - acc: 0.9893 - val_loss: 0.0485 - val_acc: 0.9840 Epoch 10/12 15000/15000 [==============================] - 2s 104us/step - loss: 0.0289 - acc: 0.9915 - val_loss: 0.0566 - val_acc: 0.9824 Epoch 11/12 15000/15000 [==============================] - 2s 111us/step - loss: 0.0291 - acc: 0.9907 - val_loss: 0.0565 - val_acc: 0.9816 Epoch 12/12 15000/15000 [==============================] - 2s 106us/step - loss: 0.0279 - acc: 0.9915 - val_loss: 0.0419 - val_acc: 0.9856 2018-08-28 19:38:26.596350: I ddl_MDR_ops.cc:270] [MPI:2 ] calling ddl_finalize 2018-08-28 19:38:26.598412: I ddl_MDR_ops.cc:270] [MPI:3 ] calling ddl_finalize 2018-08-28 19:38:26.655121: I ddl_MDR_ops.cc:270] [MPI:1 ] calling ddl_finalize 2018-08-28 19:38:27.320345: I ddl_MDR_ops.cc:270] [MPI:0 ] calling ddl_finalize Test loss: 0.0279394076111 Test accuracy: 0.992
Complete Diff
'''Trains a simple convnet on the MNIST dataset. '''Trains a simple convnet on the MNIST dataset. Gets to 99.25% test accuracy after 12 epochs Gets to 99.25% test accuracy after 12 epochs (there is still a lot of margin for parameter tuning). (there is still a lot of margin for parameter tuning). 16 seconds per epoch on a GRID K520 GPU. 16 seconds per epoch on a GRID K520 GPU. ''' ''' from __future__ import print_function from __future__ import print_function import keras | from tensorflow.python import keras as keras from keras.datasets import mnist | from tensorflow.python.keras.datasets import mnist from keras.models import Sequential | from tensorflow.python.keras.models import Sequential from keras.layers import Dense, Dropout, Flatten | from tensorflow.python.keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D | from tensorflow.python.keras.layers import Conv2D, MaxPooling2D from keras import backend as K | from tensorflow.python.keras import backend as K > import ddl > import numpy as np batch_size = 128 batch_size = 128 num_classes = 10 num_classes = 10 epochs = 12 epochs = 12 # input image dimensions # input image dimensions img_rows, img_cols = 28, 28 img_rows, img_cols = 28, 28 # the data, split between train and test sets # the data, split between train and test sets (x_train, y_train), (x_test, y_test) = mnist.load_data() (x_train, y_train), (x_test, y_test) = mnist.load_data() if K.image_data_format() == 'channels_first': if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) > # DDL: Save the full test data before splitting for final accuracy check. > x_test_full = x_test.astype('float32') / 255 > y_test_full = keras.utils.to_categorical(y_test, num_classes) > > # DDL: Split the training & testing data. > x_train = np.array_split(x_train, ddl.size())[ddl.rank()] > x_test = np.array_split(x_test, ddl.size())[ddl.rank()] x_train = x_train.astype('float32') x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_train /= 255 x_test /= 255 x_test /= 255 print('x_train shape:', x_train.shape) print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') print(x_test.shape[0], 'test samples') > # DDL: Split the training & testing data. > y_train = np.array_split(y_train, ddl.size())[ddl.rank()] > y_test = np.array_split(y_test, ddl.size())[ddl.rank()] # convert class vectors to binary class matrices # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', activation='relu', input_shape=input_shape)) input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.add(Dense(num_classes, activation='softmax')) > # DDL: adjust learning rate based on number of GPUs. model.compile(loss=keras.losses.categorical_crossentropy, model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), | optimizer=keras.optimizers.Adadelta(lr=1.0 * ddl.size()), metrics=['accuracy']) metrics=['accuracy']) > callbacks = list() > > # DDL: Add the DDL callback. > callbacks.append(ddl.DDLCallback()) > callbacks.append(ddl.DDLGlobalVariablesCallback()) > > # DDL: Only use verbose = 1 on rank 0. model.fit(x_train, y_train, model.fit(x_train, y_train, batch_size=batch_size, batch_size=batch_size, epochs=epochs, epochs=epochs, verbose=1, | verbose=1 if ddl.rank() == 0 else 0, validation_data=(x_test, y_test)) | validation_data=(x_test, y_test), > callbacks=callbacks) > # DDL: Only do final accuracy check on rank 0. > if ddl.rank() == 0: score = model.evaluate(x_test, y_test, verbose=0) | score = model.evaluate(x_test_full, y_test_full, verbose=0) print('Test loss:', score[0]) | print('Test loss:', score[0]) print('Test accuracy:', score[1]) | print('Test accuracy:', score[1])
相關推薦
How to Distribute Keras Training with PowerAI DDL
The 1.5.3 release of PowerAI includes updates to IBM’s Distributed Deep Learning (DDL) framework which facilitate the distribution of Tensorflow Keras t
How to train Keras model x20 times faster with TPU for free
How to train Keras model x20 times faster with TPU for freeFor quite some while, I feel content training my model on a single GTX 1070 graphics card which
How to Get Reproducible Results with Keras
Tweet Share Share Google Plus Neural network algorithms are stochastic. This means they make use
How to create own operator with python in mxnet?
處理 需要 調用父類 rgs rop 數據類型 賦值 創建 recipe 繼承CustomOp 定義操作符,重寫前向後向方法,此時可以通過_init__ 方法傳遞需要用到的參數 1 class LossLayer(mxnet.operator.CustomOp):
How to get bitting code with SEC-E9 key cutting machine
sec e9 key cutter sec e9 key machine sec-e9 automatic key cutting machine sec-e9 cnc automatic key machine sec-e9 key cutting machine There
轉載 -- How To Optimize Your Site With GZIP Compression
// 下面這篇文章講的非常不錯,看完了 https://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/ // Content-Encoding, 定義 fr
轉載 -- How To Optimize Your Site With HTTP Caching
https://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/ // Caching Tutorial for Web Authors and Webmasters // 下面
How to set connection timeout with OkHttp
OkHttp 用起來真的很方便。 As of OkHttp3 can now do this through the Builder like so client = new OkHttpClient.Builder() .connectTimeout(10, TimeUnit.SEC
[iOS] How to sort an NSMutableArray with custom objects in it?
範例1: I think this will do it: brandDescriptor = [[NSSortDescriptor alloc] initWithKey:@"brand" ascending:YES]; sortDescriptors = [NSArray arrayWithObject
How to become a team with chatbots
How to become a team with chatbotsBol.com, a well-known e-commerce platform in The Netherlands and Belgium, serves nearly 8.5 million active customers. For
How to Mix Headless CMS with a Vue.js Website and Pay Zero
We have a Vue.js website with some components and routes. Right now, content is defined as simple arrays in components definitions, but that makes adjustme
Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data
Simpson’s Paradox occurs when trends that appear when a dataset is separated into groups reverse when the data are aggregated. In the restaurant recommenda
How to Automate Surveillance Easily with Deep Learning
Surveillance is an integral part of security and patrol. For the most part, the job entails extended periods of looking out for something undesirable to ha
How to distribute your own Android library through jCenter and Maven Central from Android Studio
In Android Studio, if you wish to include any library to your application. You could just simply add a following line of dependency in mo
How to not make friends with AI?
This is part two of my story about the ontological dissonance with artificial intelligence. You can catch up on part one over here.TL;DR: I proposed that w
Routing in React Native apps and how to configure your project with React
Tab Navigator implements a type of navigation that exists in native iOS for many years already. Recently, Android added it to its Material design patterns
Ask HN: How to do a question with a throwaway account?
My question never shows up to the public, only to the throwaway account. But I see it done by other new throwaways often.
How to build Go plugin with data inside @ Alex Pliutau's Blog
Go 1.8 gives us a new tool for creating shared libraries, called plugins! This new plugin buildmode is currently only supported on Linux. But
How to boost online performances with Conversational Web Experiences
Today I’d like to share a story about a conversation between Antonio, an online user, and the Chief Digital Officer (CDO) of a company, whose website did n
How to Make Blogging Easier with Artificial Intelligence
Mike Kaput is a senior consultant at PR 20/20 who is passionate about AI's potential to transform marketing. At PR 20/20, he creates measurable marketing r