keras tensorboard的使用, 設定指定GPU及其記憶體, 強制只使用cpu
1.強制只使用cpu:
import os
#os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"# see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]=""
注意:os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"可能會改變沒有次句時GPU的預設序號。2. GPU記憶體佔用限制以及Tensorboard的一般使用
補充:主動設定佔用記憶體或者自適應設定GPU記憶體大小
開始學習用TensorFlow,這東西與Theano不同,預設情況下, 在開啟一個Session後,幾乎佔用顯示卡的所有視訊記憶體。如果同一個機器、顯示卡多個人使用,基本上就是先到先得,後來的程式會崩潰。查了下文件有兩種方法控制視訊記憶體:
第一種是設定成預載入比例:
tf_config = tensorflow.ConfigProto()
tf_config.gpu_options.per_process_gpu_memory_fraction = 0.5 # 分配50%
session = tensorflow.Session(config=tf_config)
還有一種是自適應,需要多少就佔多少:
tf_config = tensorflow.ConfigProto() tf_config.gpu_options.allow_growth = True session = tensorflow.Session(config=tf_config)
keras2.0版本已經添加了一些貢獻者的新建議,用keras呼叫tensorboard對訓練過程進行跟蹤觀察非常方便了。
直接上例子: (注意: 貌似呼叫tensorboard,訓練速度好像被託慢了不少。其實可以記錄model.fit的history物件,自己寫幾行程式碼顯示 點選開啟連結)
# coding: utf-8 import numpy as np from keras.datasets import mnist from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation from keras.optimizers import SGD from keras.utils import np_utils import keras.callbacks import os import tensorflow as tf import keras.backend.tensorflow_backend as KTF ###################################### # TODO: set the gpu memory using fraction # ##################################### def get_session(gpu_fraction=0.3): """ This function is to allocate GPU memory a specific fraction Assume that you have 6GB of GPU memory and want to allocate ~2GB """ num_threads = os.environ.get('OMP_NUM_THREADS') gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_fraction) if num_threads: return tf.Session(config=tf.ConfigProto( gpu_options=gpu_options, intra_op_parallelism_threads=num_threads)) else: return tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) KTF.set_session(get_session(0.6)) # using 60% of total GPU Memory os.system("nvidia-smi") # Execute the command (a string) in a subshell raw_input("Press Enter to continue...") ###################### batch_size = 128 nb_classes = 10 nb_epoch = 10 nb_data = 28 * 28 log_filepath = '/tmp/keras_log' # load data (X_train, y_train), (X_test, y_test) = mnist.load_data() # reshape print X_train.shape X_train = X_train.reshape(X_train.shape[0], X_train.shape[1] * X_train.shape[2]) X_test = X_test.reshape(X_test.shape[0], X_test.shape[1] * X_test.shape[2]) # rescale X_train = X_train.astype(np.float32) X_train /= 255 X_test = X_test.astype(np.float32) X_test /= 255 # convert class vectors to binary class matrices (one hot vectors) Y_train = np_utils.to_categorical(y_train, nb_classes) Y_test = np_utils.to_categorical(y_test, nb_classes) model = Sequential() model.add(Dense(512, input_shape=(nb_data,), init='normal', name='dense1')) # a sample is a row 28*28 model.add(Activation('relu', name='relu1')) model.add(Dropout(0.2, name='dropout1')) model.add(Dense(512, init='normal', name='dense2')) model.add(Activation('relu', name='relu2')) model.add(Dropout(0.2, name='dropout2')) model.add(Dense(10, init='normal', name='dense3')) model.add(Activation('softmax', name='softmax1')) model.summary() model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.001), metrics=['accuracy']) tb_cb = keras.callbacks.TensorBoard(log_dir=log_filepath, write_images=1, histogram_freq=1) # 設定log的儲存位置,將網路權值以圖片格式保持在tensorboard中顯示,設定每一個週期計算一次網路的 #權值,每層輸出值的分佈直方圖 cbks = [tb_cb] history = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, callbacks=cbks, validation_data=(X_test, Y_test)) score = model.evaluate(X_test, Y_test, verbose=0) print('Test score:', score[0]) print('Test accuracy;', score[1])
其實可以自己給每一層layer命名一個name, 也可以由keras根據自己的命名規則自動取名,自動命名的規則在Layer類中,程式碼如下:
name = kwargs.get('name')
if not name:
prefix = self.__class__.__name__
name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))
self.name = name
而在keras的call back模組中,tensorborad class類實現原始碼可以看出,keras預設將模型的所有層的所有weights, bias以及每一層輸出的distribution, histogram等傳送到tensorborad,方便在瀏覽器中觀察網路的執行情況。實現原始碼如下:
def set_model(self, model):
self.model = model
self.sess = K.get_session()
if self.histogram_freq and self.merged is None:
for layer in self.model.layers:
for weight in layer.weights:
tf.summary.histogram(weight.name, weight)
if self.write_images:
w_img = tf.squeeze(weight)
shape = w_img.get_shape()
if len(shape) > 1 and shape[0] > shape[1]:
w_img = tf.transpose(w_img)
if len(shape) == 1:
w_img = tf.expand_dims(w_img, 0)
w_img = tf.expand_dims(tf.expand_dims(w_img, 0), -1)
tf.summary.image(weight.name, w_img)
if hasattr(layer, 'output'):
tf.summary.histogram('{}_out'.format(layer.name),
layer.output)
self.merged = tf.summary.merge_all()
當然也可以指定輸出某一些層的,通過tensorboard引數進行設定:
embeddings_freq: frequency (in epochs) at which selected embedding layers will be saved. embeddings_layer_names: a list of names of layers to keep eye on. If None or empty list all the embedding layer will be watched.
現在執行最開始的例子,在terminal執行
tensorboard --logdir=/tmp/keras_log
在terminal開啟瀏覽器地址,進入tensorboard可以隨意瀏覽graph, distribution, histogram, 以及sclar列表中的loss, acc等等。
以下摘錄自: 這裡
TensorBoard will automatically include all runs logged within the sub-directories of the specifiedlog_dir
, for example, if you logged another run using:
Then the TensorBoard visualization would look like this:
You can use the unique_log_dir
function if you want to record every training run in it’s own directory:
Once again note that it’s not required to record every training run in it’s own directory. Using the default “logs” directory will work just fine, you’ll just only be able to visualize the most recent run using TensorBoard.
需要注意的是,tensorboard預設的slcar一欄只記錄了訓練集和驗證集上的loss,如何想記錄展示其他指標,在model.compile的metric中進行新增,例如:
model.compile(
loss = 'mean_squared_error',
optimizer = 'sgd',
metrics= c('mae', 'acc') # 視覺化mae和acc
)
3. Keras的train_on_batch函式呼叫時,使用Tensorboard的方法
import numpy as np
import tensorflow as tf
from keras.callbacks import TensorBoard
from keras.layers import Input, Dense
from keras.models import Model
def write_log(callback, names, logs, batch_no):
for name, value in zip(names, logs):
summary = tf.Summary()
summary_value = summary.value.add()
summary_value.simple_value = value
summary_value.tag = name
callback.writer.add_summary(summary, batch_no)
callback.writer.flush()
net_in = Input(shape=(3,))
net_out = Dense(1)(net_in)
model = Model(net_in, net_out)
model.compile(loss='mse', optimizer='sgd', metrics=['mae'])
log_path = './graph'
callback = TensorBoard(log_path)
callback.set_model(model)
train_names = ['train_loss', 'train_mae']
val_names = ['val_loss', 'val_mae']
for batch_no in range(100):
X_train, Y_train = np.random.rand(32, 3), np.random.rand(32, 1)
logs = model.train_on_batch(X_train, Y_train)
write_log(callback, train_names, logs, batch_no)
if batch_no % 10 == 0:
X_val, Y_val = np.random.rand(32, 3), np.random.rand(32, 1)
logs = model.train_on_batch(X_val, Y_val)
write_log(callback, val_names, logs, batch_no)
# batch_no//10
4. Tensorboard記錄每個batch的loss