1. 程式人生 > >How to Distribute Keras Training with PowerAI DDL

How to Distribute Keras Training with PowerAI DDL

The 1.5.3 release of PowerAI includes updates to IBM’s Distributed Deep Learning (DDL) framework which facilitate the distribution of Tensorflow Keras training. In this article we will walk through the process of taking an existing Tensorflow Keras model, making the code changes necessary to distribute its training using DDL and using ddlrun

to execute the distributed script.

The script we used as the starting point is the keras mnist_cnn.py example script.

Code Changes

1. Imports

The first step is to convert any keras imports to tensorflow.keras imports. This is accomplished by replacing import keras with from tensorflow.python import keras as keras

and replacing imports of the form from keras.xxxxx import ... with imports of the form from tensorflow.python.keras.xxxxx import .... We also have to import ddl and numpy as np. Importing ddl automatically distributes the gradient computation during training.

import keras                                                                  | from tensorflow.python 
import keras as keras
from keras.datasets import mnist | from tensorflow.python.keras.datasets import mnist from keras.models import Sequential | from tensorflow.python.keras.models import Sequential from keras.layers import Dense, Dropout, Flatten | from tensorflow.python.keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D | from tensorflow.python.keras.layers import Conv2D, MaxPooling2D from keras import backend as K | from tensorflow.python.keras import backend as K > import ddl > import numpy as np

2. Split the Training and Test Data

Next we have to split the training and test data so that each gpu is working on different data. This split is what is actually splitting up the work for ddl.

  • x_test_full and y_test_full are added to be able to do a final model evaluation at the end.
  • np.array_split(x_train, ddl.size())[ddl.rank()] is used to split the training data into ddl.size() pieces and select the piece that corresponds to the current rank, ddl.rank(). The same is done for all training and test data and labels.
                                                                              > # DDL: Save the full test data before splitting for final accuracy check.
                                                                              > x_test_full = x_test.astype('float32') / 255
                                                                              > y_test_full = keras.utils.to_categorical(y_test, num_classes)
                                                                              >
                                                                              > # DDL: Split the training & testing data.
                                                                              > x_train = np.array_split(x_train, ddl.size())[ddl.rank()]
                                                                              > x_test = np.array_split(x_test, ddl.size())[ddl.rank()]
x_train = x_train.astype('float32')                                             x_train = x_train.astype('float32')
x_test = x_test.astype('float32')                                               x_test = x_test.astype('float32')
x_train /= 255                                                                  x_train /= 255
x_test /= 255                                                                   x_test /= 255
print('x_train shape:', x_train.shape)                                          print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')                                        print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')                                          print(x_test.shape[0], 'test samples')

                                                                              > # DDL: Split the training & testing data.
                                                                              > y_train = np.array_split(y_train, ddl.size())[ddl.rank()]
                                                                              > y_test = np.array_split(y_test, ddl.size())[ddl.rank()]

3. Adjust the Learning Rate

The next change we have to make is to multiply the learning rate by the total number of GPUs. The intuition behind this is as follows. Since we are splitting up the data and performing gradient descent across ddl.size() GPUs, each with a batch size of 128, we end up with an effective global batch size of 128 * ddl.size(). This has the result of reducing the number of gradient descent updates that occur each epoch, slowing the convergence rate by a factor of approximately the number of GPUs. To compensate for this we must scale the learning rate by ddl.size().

                                                                              > # DDL: adjust learning rate based on number of GPUs.
model.compile(loss=keras.losses.categorical_crossentropy,                       model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),                          |               optimizer=keras.optimizers.Adadelta(lr=1.0 * ddl.size()),
              metrics=['accuracy'])                                                           metrics=['accuracy'])

4. Add DDL Callbacks

DDL requires that 2 callbacks be added to the list of Keras callbacks. To ensure that metrics used for early stopping and other hyper parameter tuning remain in sync throughout training, we have to add ddl.DDLCallback() as the first callback in the callback list. To ensure that all global variables in the model are correctly initialized we have to add ddl.DDLGlobalVariablesCallback() as the last callback in the callback list.

                                                                              > callbacks = list()
                                                                              >
                                                                              > # DDL: Add the DDL callback.
                                                                              > callbacks.append(ddl.DDLCallback())
                                                                              > callbacks.append(ddl.DDLGlobalVariablesCallback())

5. Restrict Printing to Rank 0

There are usually some operations that we only want to perform on a single node, printing for example. To restrict certain operations to rank 0 we can use if ddl.rank() == 0. Here we also use x_test_full and y_test_full to perform an evaluation of the model for the final accuracy check to display at the end.

                                                                              > # DDL: Only use verbose = 1 on rank 0.
model.fit(x_train, y_train,                                                     model.fit(x_train, y_train,
          batch_size=batch_size,                                                          batch_size=batch_size,
          epochs=epochs,                                                                  epochs=epochs,
          verbose=1,                                                          |           verbose=1 if ddl.rank() == 0 else 0,
          validation_data=(x_test, y_test))                                   |           validation_data=(x_test, y_test),
                                                                              >           callbacks=callbacks)
                                                                              > # DDL: Only do final accuracy check on rank 0.
                                                                              > if ddl.rank() == 0:
score = model.evaluate(x_test, y_test, verbose=0)                             | score = model.evaluate(x_test_full, y_test_full, verbose=0)
print('Test loss:', score[0])                                                 | print('Test loss:', score[0])
print('Test accuracy:', score[1])                                             | print('Test accuracy:', score[1])

Running the Script

To run the script across any number of nodes, we can use the following commands:

$ source /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-activate
$ /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-install-samples ~/samples
$ ddlrun -H host1,host2,host3,host4,... python ~/samples/examples/keras/mnist-tf-keras.py

On 4 GPUs the output looks like:

$ ddlrun python ~/samples/examples/keras/mnist-tf-keras.py
+ mpirun -x PATH -x LD_LIBRARY_PATH -x PYTHONPATH -tcp -disable_gpu_hooks --rankfile /tmp/DDLRUN/ddlrun.Rd3PDdkJYvRb/RANKFILE -x 'DDL_OPTIONS=-mode p:4x1x1x1 ' -n 4 python ~/samples/examples/keras/mnist-tf-keras.py
DDL: DDL_GROUP_SIZE=10000000.
2018-08-28 19:37:57.689450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0035:03:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-28 19:37:57.689548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 2
2018-08-28 19:37:57.689856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0004:05:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-28 19:37:57.689948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 1
2018-08-28 19:37:57.691164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0004:04:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-28 19:37:57.691221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-08-28 19:37:57.726137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0035:04:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-28 19:37:57.726350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 3
2018-08-28 19:37:58.078092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:37:58.078179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2018-08-28 19:37:58.078203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2018-08-28 19:37:58.078863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0)
2018-08-28 19:37:58.080687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:37:58.080722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      1
2018-08-28 19:37:58.080738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   N
2018-08-28 19:37:58.081261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14849 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0)
2018-08-28 19:37:58.150432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:37:58.150481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      2
2018-08-28 19:37:58.150495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   N
2018-08-28 19:37:58.151084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0)
2018-08-28 19:37:58.405935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:37:58.406037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      3
2018-08-28 19:37:58.406065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3:   N
2018-08-28 19:37:58.406917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0)
I 19:37:58.441 122001 122471 DDL:41  ] [MPI:0   ] ==== IBM Corp. DDL 1.1.0 + (MPI 3.1) ====
2018-08-28 19:37:59.918249: I ddl_MDR_ops.cc:826] [MPI:2   ]  name=Init local_gpuid=2 local_rank=2 local_size=4
2018-08-28 19:37:59.918246: I ddl_MDR_ops.cc:826] [MPI:3   ]  name=Init local_gpuid=3 local_rank=3 local_size=4
2018-08-28 19:37:59.918266: I ddl_MDR_ops.cc:826] [MPI:1   ]  name=Init local_gpuid=1 local_rank=1 local_size=4
2018-08-28 19:37:59.918266: I ddl_MDR_ops.cc:826] [MPI:0   ]  name=Init local_gpuid=0 local_rank=0 local_size=4
DDL: rank: 0, size: 4, gpuid: 0, hosts: 1
DDL: rank: 1, size: 4, gpuid: 1, hosts: 1
DDL: rank: 2, size: 4, gpuid: 2, hosts: 1
DDL: rank: 3, size: 4, gpuid: 3, hosts: 1
x_train shape: (15000, 28, 28, 1)
15000 train samples
2500 test samples
x_train shape: (15000, 28, 28, 1)
15000 train samples
2500 test samples
x_train shape: (15000, 28, 28, 1)
15000 train samples
2500 test samples
x_train shape: (15000, 28, 28, 1)
15000 train samples
2500 test samples
2018-08-28 19:38:00.963727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 2
2018-08-28 19:38:00.963824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:38:00.963838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      2
2018-08-28 19:38:00.963851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   N
2018-08-28 19:38:00.964433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0)
Train on 15000 samples, validate on 2500 samples
2018-08-28 19:38:01.026421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-08-28 19:38:01.026512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:38:01.026535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2018-08-28 19:38:01.026565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2018-08-28 19:38:01.027149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14847 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0)
2018-08-28 19:38:01.245015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 1
2018-08-28 19:38:01.245136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:38:01.245160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      1
2018-08-28 19:38:01.245180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   N
2018-08-28 19:38:01.245765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14849 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0)
2018-08-28 19:38:02.113830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 3
2018-08-28 19:38:02.113934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-28 19:38:02.113949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      3
2018-08-28 19:38:02.113963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3:   N
2018-08-28 19:38:02.114639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14846 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0)
Epoch 1/12
2018-08-28 19:38:04.161660: I ddl_MDR_ops.cc:357] [MPI:0   ]  name=training/Adadelta/AllReduceN _global_buf_size=1199882 _N=8
I 19:38:04.529 122001 122637 DDL:703 ] [MPI:0   ] selected algo: NCCLB   - NCCLB
15000/15000 [==============================] - 4s 282us/step - loss: 0.4831 - acc: 0.8497 - val_loss: 0.1190 - val_acc: 0.9596
Epoch 2/12
15000/15000 [==============================] - 2s 119us/step - loss: 0.1169 - acc: 0.9679 - val_loss: 0.0846 - val_acc: 0.9700
Epoch 3/12
15000/15000 [==============================] - 2s 133us/step - loss: 0.0805 - acc: 0.9760 - val_loss: 0.0731 - val_acc: 0.9728
Epoch 4/12
15000/15000 [==============================] - 2s 118us/step - loss: 0.0693 - acc: 0.9797 - val_loss: 0.0571 - val_acc: 0.9792
Epoch 5/12
15000/15000 [==============================] - 2s 122us/step - loss: 0.0514 - acc: 0.9843 - val_loss: 0.0443 - val_acc: 0.9832
Epoch 6/12
15000/15000 [==============================] - 2s 120us/step - loss: 0.0473 - acc: 0.9868 - val_loss: 0.0539 - val_acc: 0.9804
Epoch 7/12
15000/15000 [==============================] - 2s 120us/step - loss: 0.0408 - acc: 0.9869 - val_loss: 0.0510 - val_acc: 0.9844
Epoch 8/12
15000/15000 [==============================] - 2s 121us/step - loss: 0.0398 - acc: 0.9877 - val_loss: 0.0579 - val_acc: 0.9836
Epoch 9/12
15000/15000 [==============================] - 2s 122us/step - loss: 0.0373 - acc: 0.9893 - val_loss: 0.0485 - val_acc: 0.9840
Epoch 10/12
15000/15000 [==============================] - 2s 104us/step - loss: 0.0289 - acc: 0.9915 - val_loss: 0.0566 - val_acc: 0.9824
Epoch 11/12
15000/15000 [==============================] - 2s 111us/step - loss: 0.0291 - acc: 0.9907 - val_loss: 0.0565 - val_acc: 0.9816
Epoch 12/12
15000/15000 [==============================] - 2s 106us/step - loss: 0.0279 - acc: 0.9915 - val_loss: 0.0419 - val_acc: 0.9856
2018-08-28 19:38:26.596350: I ddl_MDR_ops.cc:270] [MPI:2   ] calling ddl_finalize

2018-08-28 19:38:26.598412: I ddl_MDR_ops.cc:270] [MPI:3   ] calling ddl_finalize

2018-08-28 19:38:26.655121: I ddl_MDR_ops.cc:270] [MPI:1   ] calling ddl_finalize

2018-08-28 19:38:27.320345: I ddl_MDR_ops.cc:270] [MPI:0   ] calling ddl_finalize

Test loss: 0.0279394076111
Test accuracy: 0.992

Complete Diff

'''Trains a simple convnet on the MNIST dataset.                                '''Trains a simple convnet on the MNIST dataset.

Gets to 99.25% test accuracy after 12 epochs                                    Gets to 99.25% test accuracy after 12 epochs
(there is still a lot of margin for parameter tuning).                          (there is still a lot of margin for parameter tuning).
16 seconds per epoch on a GRID K520 GPU.                                        16 seconds per epoch on a GRID K520 GPU.
'''                                                                             '''

from __future__ import print_function                                           from __future__ import print_function
import keras                                                                  | from tensorflow.python import keras as keras
from keras.datasets import mnist                                              | from tensorflow.python.keras.datasets import mnist
from keras.models import Sequential                                           | from tensorflow.python.keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten                              | from tensorflow.python.keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D                                 | from tensorflow.python.keras.layers import Conv2D, MaxPooling2D
from keras import backend as K                                                | from tensorflow.python.keras import backend as K
                                                                              > import ddl
                                                                              > import numpy as np

batch_size = 128                                                                batch_size = 128
num_classes = 10                                                                num_classes = 10
epochs = 12                                                                     epochs = 12

# input image dimensions                                                        # input image dimensions
img_rows, img_cols = 28, 28                                                     img_rows, img_cols = 28, 28

# the data, split between train and test sets                                   # the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()                        (x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':                                   if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)              x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)                 x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)                                           input_shape = (1, img_rows, img_cols)
else:                                                                           else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)              x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)                 x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)                                           input_shape = (img_rows, img_cols, 1)

                                                                              > # DDL: Save the full test data before splitting for final accuracy check.
                                                                              > x_test_full = x_test.astype('float32') / 255
                                                                              > y_test_full = keras.utils.to_categorical(y_test, num_classes)
                                                                              >
                                                                              > # DDL: Split the training & testing data.
                                                                              > x_train = np.array_split(x_train, ddl.size())[ddl.rank()]
                                                                              > x_test = np.array_split(x_test, ddl.size())[ddl.rank()]
x_train = x_train.astype('float32')                                             x_train = x_train.astype('float32')
x_test = x_test.astype('float32')                                               x_test = x_test.astype('float32')
x_train /= 255                                                                  x_train /= 255
x_test /= 255                                                                   x_test /= 255
print('x_train shape:', x_train.shape)                                          print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')                                        print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')                                          print(x_test.shape[0], 'test samples')

                                                                              > # DDL: Split the training & testing data.
                                                                              > y_train = np.array_split(y_train, ddl.size())[ddl.rank()]
                                                                              > y_test = np.array_split(y_test, ddl.size())[ddl.rank()]
# convert class vectors to binary class matrices                                # convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)                      y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)                        y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()                                                            model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),                                        model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',                                                              activation='relu',
                 input_shape=input_shape))                                                       input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))                                model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))                                       model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))                                                        model.add(Dropout(0.25))
model.add(Flatten())                                                            model.add(Flatten())
model.add(Dense(128, activation='relu'))                                        model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))                                                         model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))                             model.add(Dense(num_classes, activation='softmax'))

                                                                              > # DDL: adjust learning rate based on number of GPUs.
model.compile(loss=keras.losses.categorical_crossentropy,                       model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),                          |               optimizer=keras.optimizers.Adadelta(lr=1.0 * ddl.size()),
              metrics=['accuracy'])                                                           metrics=['accuracy'])

                                                                              > callbacks = list()
                                                                              >
                                                                              > # DDL: Add the DDL callback.
                                                                              > callbacks.append(ddl.DDLCallback())
                                                                              > callbacks.append(ddl.DDLGlobalVariablesCallback())
                                                                              >
                                                                              > # DDL: Only use verbose = 1 on rank 0.
model.fit(x_train, y_train,                                                     model.fit(x_train, y_train,
          batch_size=batch_size,                                                          batch_size=batch_size,
          epochs=epochs,                                                                  epochs=epochs,
          verbose=1,                                                          |           verbose=1 if ddl.rank() == 0 else 0,
          validation_data=(x_test, y_test))                                   |           validation_data=(x_test, y_test),
                                                                              >           callbacks=callbacks)
                                                                              > # DDL: Only do final accuracy check on rank 0.
                                                                              > if ddl.rank() == 0:
score = model.evaluate(x_test, y_test, verbose=0)                             | score = model.evaluate(x_test_full, y_test_full, verbose=0)
print('Test loss:', score[0])                                                 | print('Test loss:', score[0])
print('Test accuracy:', score[1])                                             | print('Test accuracy:', score[1])

相關推薦

How to Distribute Keras Training with PowerAI DDL

The 1.5.3 release of PowerAI includes updates to IBM’s Distributed Deep Learning (DDL) framework which facilitate the distribution of Tensorflow Keras t

How to train Keras model x20 times faster with TPU for free

How to train Keras model x20 times faster with TPU for freeFor quite some while, I feel content training my model on a single GTX 1070 graphics card which

How to Get Reproducible Results with Keras

Tweet Share Share Google Plus Neural network algorithms are stochastic. This means they make use

How to create own operator with python in mxnet?

處理 需要 調用父類 rgs rop 數據類型 賦值 創建 recipe 繼承CustomOp 定義操作符,重寫前向後向方法,此時可以通過_init__ 方法傳遞需要用到的參數 1 class LossLayer(mxnet.operator.CustomOp):

How to get bitting code with SEC-E9 key cutting machine

sec e9 key cutter sec e9 key machine sec-e9 automatic key cutting machine sec-e9 cnc automatic key machine sec-e9 key cutting machine There

轉載 -- How To Optimize Your Site With GZIP Compression

// 下面這篇文章講的非常不錯,看完了 https://betterexplained.com/articles/how-to-optimize-your-site-with-gzip-compression/   // Content-Encoding, 定義 fr

轉載 -- How To Optimize Your Site With HTTP Caching

https://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/   // Caching Tutorial for Web Authors and Webmasters // 下面

How to set connection timeout with OkHttp

OkHttp 用起來真的很方便。 As of OkHttp3 can now do this through the Builder like so client = new OkHttpClient.Builder() .connectTimeout(10, TimeUnit.SEC

[iOS] How to sort an NSMutableArray with custom objects in it?

範例1: I think this will do it: brandDescriptor = [[NSSortDescriptor alloc] initWithKey:@"brand" ascending:YES]; sortDescriptors = [NSArray arrayWithObject

How to become a team with chatbots

How to become a team with chatbotsBol.com, a well-known e-commerce platform in The Netherlands and Belgium, serves nearly 8.5 million active customers. For

How to Mix Headless CMS with a Vue.js Website and Pay Zero

We have a Vue.js website with some components and routes. Right now, content is defined as simple arrays in components definitions, but that makes adjustme

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Simpson’s Paradox occurs when trends that appear when a dataset is separated into groups reverse when the data are aggregated. In the restaurant recommenda

How to Automate Surveillance Easily with Deep Learning

Surveillance is an integral part of security and patrol. For the most part, the job entails extended periods of looking out for something undesirable to ha

How to distribute your own Android library through jCenter and Maven Central from Android Studio

In Android Studio, if you wish to include any library to your application. You could just simply add a following line of dependency in mo

How to not make friends with AI?

This is part two of my story about the ontological dissonance with artificial intelligence. You can catch up on part one over here.TL;DR: I proposed that w

Routing in React Native apps and how to configure your project with React

Tab Navigator implements a type of navigation that exists in native iOS for many years already. Recently, Android added it to its Material design patterns

Ask HN: How to do a question with a throwaway account?

My question never shows up to the public, only to the throwaway account. But I see it done by other new throwaways often.

How to build Go plugin with data inside @ Alex Pliutau's Blog

Go 1.8 gives us a new tool for creating shared libraries, called plugins! This new plugin buildmode is currently only supported on Linux. But

How to boost online performances with Conversational Web Experiences

Today I’d like to share a story about a conversation between Antonio, an online user, and the Chief Digital Officer (CDO) of a company, whose website did n

How to Make Blogging Easier with Artificial Intelligence

Mike Kaput is a senior consultant at PR 20/20 who is passionate about AI's potential to transform marketing. At PR 20/20, he creates measurable marketing r