The bug about using hooks and MirroredStrategy in tf.estimator.Estimator
阿新 • • 發佈:2018-12-23
When I was using MirroedStrategy in my tf.estimator.Estimator:
Python1234567 | distribution=tf.contrib.distribute.MirroredStrategy(["/device:GPU:0","/device:GPU:1"])config=tf.estimator.RunConfig(train_distribute=distribution,eval_distribute=distribution)estimator=tf.estimator.Estimator(model_fn=build_model_fn_optimizer(),config= |
and add hooks for training:
Python12 | logging_hook=tf.train.LoggingTensorHook({'logits':logits})returntf.estimator.EstimatorSpec(mode,loss=loss_fn(),train_op=train_op,training_hooks=[logging_hook]) |
The tensorflow report errors:
1234567891011 | File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line356,intrainloss=self._train_model(input_fn,hooks,saving_listeners)File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1179,in_train_modelreturnself._train_model_distributed(input_fn,hooks,saving_listeners)File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1309,in_train_model_distributedgrouped_estimator_spec.training_hooks)File"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1305,inget_hooks_from_the_first_deviceforper_device_hook inper_device_hooksFile"/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py",line1305,in<listcomp>forper_device_hook inper_device_hooksAttributeError:'Estimator'objecthas no attribute'_distribution' |
Without finding any answers on google, I have to look into the code of ‘estimator.py’ in tensorflow. Fortunately, the code defect is obvious:
Python123456789101112 | scaffold=_combine_distributed_scaffold(grouped_estimator_spec.scaffold,self._train_distribution)# TODO(yuefengz): add a test for unwrapping per_device_hooks.defget_hooks_from_the_first_device(per_device_hooks):return[self._distribution.unwrap(per_device_hook)[0]forper_device_hook inper_device_hooks]training_hooks=get_hooks_from_the_first_device(grouped_estimator_spec.training_hooks) |
class Estimator havn’t any private argument named ‘_distribution’ but only have ‘_train_distribution’ and ‘_eval_distribution’. So the fix is just change ‘self._distribution.unwrap(per_device_hook)[0]’ to ‘self._train_distribution.unwrap(per_device_hook)[0]’.
I had submitted a request pull for tensorflow to fix this bug in branch 1.11