全域性管理單例類Caffe
Caffe單例子
Caffe採用單例模式(singleton)為每個執行緒儲存相關的變數和控制代碼.
在不同執行緒下的Caffe物件可以通過Get()函式進行訪問.
static boost::thread_specific_ptr<Caffe> thread_instance_;
Caffe& Caffe::Get() {
if (!thread_instance_.get()) {
thread_instance_.reset(new Caffe());
}
return *(thread_instance_.get());
}
依照編譯模式的不同可以將程式歸納為兩種執行模式:
1. CPU_ONLY 模式
2. CPU + GPU 模式
第一種模式僅僅使用CPU作計算,因此不涉及cuda,第二種模式涉及到GPU程式設計.因此兩種模式下類的定義和實現會有所不同.
資料成員
Caffe類有4或6個數據成員:
class Caffe {
public:
enum Brew { CPU, GPU };
protected:
#ifndef CPU_ONLY
cublasHandle_t cublas_handle_;
curandGenerator_t curand_generator_;
#endif
shared_ptr<RNG> random_generator_;
Brew mode_;
int solver_count_;
bool root_solver_;
};
在GPU+CPU模式下定義了有兩個額外的成員:
* cublasHandle_t
: 指向cuBLAS庫環境的指標.cuBLAS庫相關資訊需要通過cublasCreate()來建立,cublasDestroy()來銷燬.
* curandGenerator_t
: 用於生成相應的隨機數指標.
其他共有的資料成員:
* Brew mode_
:程式執行的模式,GPU或者CPU.
* shared_ptr<RNG> random_generator_
: 用於產生隨機數
* int solver_count_
*
bool root_solver_
: 主訓練器
Getters 和 Setters
Caffe有如下幾個訪問和設定資料成員的函式.
inline static RNG& rng_stream() {
if (!Get().random_generator_) {
Get().random_generator_.reset(new RNG());
}
return *(Get().random_generator_);
}
#ifndef CPU_ONLY
inline static cublasHandle_t cublas_handle() { return Get().cublas_handle_; }
inline static curandGenerator_t curand_generator() {
return Get().curand_generator_;
}
#endif
inline static Brew mode() { return Get().mode_; }
inline static void set_mode(Brew mode) { Get().mode_ = mode; }
static void set_random_seed(const unsigned int seed);
static void SetDevice(const int device_id);
static void DeviceQuery();
static bool CheckDevice(const int device_id);
static int FindDevice(const int start_id = 0);
inline static int solver_count() { return Get().solver_count_; }
inline static void set_solver_count(int val) { Get().solver_count_ = val; }
inline static bool root_solver() { return Get().root_solver_; }
inline static void set_root_solver(bool val) { Get().root_solver_ = val; }
同樣,這些函式的實現也依賴於程式的編譯模式.
隨機數生成
隨機數是Caffe物件要管理的主要內容之一.Caffe類提供了一個產生隨機數的統一介面,而這個藉口實現依賴於多種隨機數的生成方式.
通常,隨機數的產生通過一組協作的類來實現: 隨機數引擎(engines) 和 隨機數分佈(distribution).通常隨機數引擎產生的隨機數不能直接使用,需要通過分佈型別的協作才能產生期望的隨機值.隨機數引擎(engines)類的例項化常常會用到隨機數種子(seed),即 random_engines e(seed);
隨機數種子產生函式 Caffe::cluster_seedgen
seed是產生隨機數重要的引數之一, 函式cluster_seedgen
利用Linux下的熵池檔案(urandom)或者時間戳(熵池檔案不可用時)來產生隨機數種子.
int64_t cluster_seedgen(void) {
int64_t s, seed, pid;
FILE* f = fopen("/dev/urandom", "rb");
if (f && fread(&seed, 1, sizeof(seed), f) == sizeof(seed)) {
fclose(f);
return seed;
}
LOG(INFO) << "System entropy source not available, "
"using fallback algorithm to generate seed instead.";
if (f)
fclose(f);
pid = getpid();
s = time(NULL);
seed = std::abs(((s * 181) * ((pid - 83) * 359)) % 104729);
return seed;
}
隨機數引擎:RNG巢狀類
在Caffe類中定義了RNG類用做於隨機數生成器, RNG的初始化依賴隨機數種子seed.
類RNG的定義如下:
class RNG {
public:
RNG();
explicit RNG(unsigned int seed);
explicit RNG(const RNG&);
RNG& operator=(const RNG&);
void* generator();
private:
class Generator;
shared_ptr<Generator> generator_;
};
Class Generator
是RNG的巢狀類. 它是一個隨機數引擎.定義為:
typedef boost::mt19937 rng_t;
class Caffe::RNG::Generator {
public:
Generator() : rng_(new caffe::rng_t(cluster_seedgen())) {}
explicit Generator(unsigned int seed) : rng_(new caffe::rng_t(seed)) {}
caffe::rng_t* rng() { return rng_.get(); }
private:
shared_ptr<caffe::rng_t> rng_;
};
Generator的建構函式通過cluster_seedgen()產生隨機種子(random seeding)或給定seed產生一個boost::mt19937
物件.
RNG的實現由是否定義CPU_ONLY而存在差別.
- 在CPU_ONLY模式下的實現:
Caffe::RNG::RNG() : generator_(new Generator()) { }
Caffe::RNG::RNG(unsigned int seed) : generator_(new Generator(seed)) { }
void* Caffe::RNG::generator() {
return static_cast<void*>(generator_->rng());
}
Caffe::RNG& Caffe::RNG::operator=(const RNG& other) {
generator_ = other.generator_;
return *this;
}
2. 在 CPU + GPU 模式下的實現:
Caffe::RNG::RNG() : generator_(new Generator()) { }
Caffe::RNG::RNG(unsigned int seed) : generator_(new Generator(seed)) { }
Caffe::RNG& Caffe::RNG::operator=(const RNG& other) {
generator_.reset(other.generator_.get());
return *this;
}
void* Caffe::RNG::generator() {
return static_cast<void*>(generator_->rng());
}
兩者的不同之處僅在於賦值操作的實現:
- generator_ = other.generator_;
- generator_.reset(other.generator_.get());
Caffe::Caffe的建構函式和解構函式
不同模式下,顯然會有不同的建構函式和西溝函式:
- CPU_ONLY 模式下
建構函式不涉及cuda實現的細節,隨機數的實現採用boost rng.
Caffe::Caffe()
: random_generator_(), mode_(Caffe::CPU),
solver_count_(1), root_solver_(true) { }
Caffe::~Caffe() { }
- GPU + CPU 模式下
建構函式需要實現和curand,cublas handler相關初始化細節.隨機數的產生依靠curand實現.
Caffe::Caffe()
: cublas_handle_(NULL), curand_generator_(NULL), random_generator_(),
mode_(Caffe::CPU), solver_count_(1), root_solver_(true) {
if (cublasCreate(&cublas_handle_) != CUBLAS_STATUS_SUCCESS) {
LOG(ERROR) << "Cannot create Cublas handle. Cublas won't be available.";
}
if (curandCreateGenerator(&curand_generator_, CURAND_RNG_PSEUDO_DEFAULT)
!= CURAND_STATUS_SUCCESS ||
curandSetPseudoRandomGeneratorSeed(curand_generator_, cluster_seedgen())
!= CURAND_STATUS_SUCCESS) {
LOG(ERROR) << "Cannot create Curand generator. Curand won't be available.";
}
}
Caffe::~Caffe() {
if (cublas_handle_) CUBLAS_CHECK(cublasDestroy(cublas_handle_));
if (curand_generator_) {
CURAND_CHECK(curandDestroyGenerator(curand_generator_));
}
}
Aside curandGenerator_t:用於生成相應的隨機數指標,通過呼叫
curandCreateGenerator(curandGenerator_t* generator, curandRngType_t rng_type)
來指定隨機數生成演算法.
通過函式
curandSetPseudoRandomGeneratorSeed(curandGenerator_t generator, unsigned long long seed)
來設定隨機數的初始值(種子).
然後通過
curandGenerateUniform(curandGenerator_t generator, float* outputPtr, size_t num)
產生特定個數的隨機數.
最後呼叫curandDestroyGenerotor釋放隨機數指標變數.Aside cuBLAS context: cuBLAS庫環境需要呼叫cublasCreate()函式來對控制代碼(handle)進行初始化.並且這個控制代碼需要傳遞給隨後呼叫的cuBLAS庫函式.當一個程式不再使用cuBLAS庫環境,必須通過呼叫cublasDestory()函式來對這個context佔用的資源進行釋放.即在使用cuBLAS之前需要先申請相應的庫資源,在使用完畢之後,釋放該資源.這樣做的好處在於,它能允許使用者在使用多執行緒和多GPU的情況下,精確地控制每個庫環境.例如
cudaSetDevice()
函式可以將不同的主機執行緒關聯到不同的GPU裝置上.在這種情況下,為每個執行緒可以為其關聯的裝置建立不同的cuBLAS庫環境.同時,每個庫環境有其特定的handle,所以通過不同的控制代碼,就可以將計算任務分發給不同的GPU裝置.如果需要在同一個執行緒中,需要呼叫不同的GPU裝置進行運算,那麼可以首先需要通過cudaSetDevice()
來指定GPU,然後通過cublasCreate()
建立一個cuBLAS庫環境.Aside
cublasCreate(cublasHandle_t *handle)
該函式用於初始化一個cuBLAS庫環境.這個庫環境繫結在當前的GPU裝置上.如果需要使用多GPU裝置,那麼可以為每個裝置建立一個cuBLAS庫環境(多GPU).另外也可以為同一個裝置建立多個不同配置的庫環境(例如多執行緒,單GPU).Aside cublasDestroy(cublasHandle_t handle)
該函式與cublasCreate()成對出現,用於釋放建立的cuBLAS庫環境.當不再使用該cuBLAS庫環境時,呼叫cublasDestroy來釋放資源.同時cublasDestroy會順帶呼叫cublasDeviceSynchronize來對裝置進行同步,所以建議一個庫環境應該儘可能長地使用,從而讓cublasCreate/cublasDestroy呼叫的次數儘可能小.
訪問資料成員
Caffe類管理一個執行緒下的相關資訊.這些資訊的訪問通過簡單的成員函式實現,即前面提到的Getters和Setters函式.
設定隨機數種子
static void set_random_seed(const unsigned int seed);
- CPU_ONLY 模式
void Caffe::set_random_seed(const unsigned int seed) {
// RNG seed
Get().random_generator_.reset(new RNG(seed));
}
- GPU + CPU 模式
void Caffe::set_random_seed(const unsigned int seed) {
// Curand seed
static bool g_curand_availability_logged = false;
if (Get().curand_generator_) {
CURAND_CHECK(curandSetPseudoRandomGeneratorSeed(curand_generator(),
seed));
CURAND_CHECK(curandSetGeneratorOffset(curand_generator(), 0));
} else {
if (!g_curand_availability_logged) {
LOG(ERROR) <<
"Curand not available. Skipping setting the curand seed.";
g_curand_availability_logged = true;
}
}
// RNG seed
Get().random_generator_.reset(new RNG(seed));
}
訪問GPU裝置
顯然,在CPU_ONLY模式下,這些GPU裝置將不存在,因此也無法訪問.
- CPU_ONLY 模式
void Caffe::SetDevice(const int device_id) {
NO_GPU;
}
void Caffe::DeviceQuery() {
NO_GPU;
}
bool Caffe::CheckDevice(const int device_id) {
NO_GPU;
return false;
}
int Caffe::FindDevice(const int start_id) {
NO_GPU;
return -1;
}
- GPU + CPU 模式
常規模式下,GPU裝置的訪問可以通過cuda自帶的庫函式.
void Caffe::SetDevice(const int device_id) {
int current_device;
CUDA_CHECK(cudaGetDevice(¤t_device));
if (current_device == device_id) {
return;
}
// The call to cudaSetDevice must come before any calls to Get, which
// may perform initialization using the GPU.
CUDA_CHECK(cudaSetDevice(device_id));
if (Get().cublas_handle_) {
CUBLAS_CHECK(cublasDestroy(Get().cublas_handle_));
}
if (Get().curand_generator_) {
CURAND_CHECK(curandDestroyGenerator(Get().curand_generator_));
}
CUBLAS_CHECK(cublasCreate(&Get().cublas_handle_));
CURAND_CHECK(curandCreateGenerator(&Get().curand_generator_,
CURAND_RNG_PSEUDO_DEFAULT));
CURAND_CHECK(curandSetPseudoRandomGeneratorSeed(Get().curand_generator_,
cluster_seedgen()));
}
void Caffe::DeviceQuery() {
cudaDeviceProp prop;
int device;
if (cudaSuccess != cudaGetDevice(&device)) {
printf("No cuda device present.\n");
return;
}
CUDA_CHECK(cudaGetDeviceProperties(&prop, device));
LOG(INFO) << "Device id: " << device;
LOG(INFO) << "Major revision number: " << prop.major;
LOG(INFO) << "Minor revision number: " << prop.minor;
LOG(INFO) << "Name: " << prop.name;
LOG(INFO) << "Total global memory: " << prop.totalGlobalMem;
LOG(INFO) << "Total shared memory per block: " << prop.sharedMemPerBlock;
LOG(INFO) << "Total registers per block: " << prop.regsPerBlock;
LOG(INFO) << "Warp size: " << prop.warpSize;
LOG(INFO) << "Maximum memory pitch: " << prop.memPitch;
LOG(INFO) << "Maximum threads per block: " << prop.maxThreadsPerBlock;
LOG(INFO) << "Maximum dimension of block: "
<< prop.maxThreadsDim[0] << ", " << prop.maxThreadsDim[1] << ", "
<< prop.maxThreadsDim[2];
LOG(INFO) << "Maximum dimension of grid: "
<< prop.maxGridSize[0] << ", " << prop.maxGridSize[1] << ", "
<< prop.maxGridSize[2];
LOG(INFO) << "Clock rate: " << prop.clockRate;
LOG(INFO) << "Total constant memory: " << prop.totalConstMem;
LOG(INFO) << "Texture alignment: " << prop.textureAlignment;
LOG(INFO) << "Concurrent copy and execution: "
<< (prop.deviceOverlap ? "Yes" : "No");
LOG(INFO) << "Number of multiprocessors: " << prop.multiProcessorCount;
LOG(INFO) << "Kernel execution timeout: "
<< (prop.kernelExecTimeoutEnabled ? "Yes" : "No");
return;
}
bool Caffe::CheckDevice(const int device_id) {
bool r = ((cudaSuccess == cudaSetDevice(device_id)) &&
(cudaSuccess == cudaFree(0)));
cudaGetLastError();
return r;
}
int Caffe::FindDevice(const int start_id) {
int count = 0;
CUDA_CHECK(cudaGetDeviceCount(&count));
for (int i = start_id; i < count; i++) {
if (CheckDevice(i)) return i;
}
return -1;
}
cublasGetErrorString 和 curandGetErrorString
Caffe::cublasGetErrorString
const char* cublasGetErrorString(cublasStatus_t error) {
switch (error) {
case CUBLAS_STATUS_SUCCESS:
return "CUBLAS_STATUS_SUCCESS";
case CUBLAS_STATUS_NOT_INITIALIZED:
return "CUBLAS_STATUS_NOT_INITIALIZED";
case CUBLAS_STATUS_ALLOC_FAILED:
return "CUBLAS_STATUS_ALLOC_FAILED";
case CUBLAS_STATUS_INVALID_VALUE:
return "CUBLAS_STATUS_INVALID_VALUE";
case CUBLAS_STATUS_ARCH_MISMATCH:
return "CUBLAS_STATUS_ARCH_MISMATCH";
case CUBLAS_STATUS_MAPPING_ERROR:
return "CUBLAS_STATUS_MAPPING_ERROR";
case CUBLAS_STATUS_EXECUTION_FAILED:
return "CUBLAS_STATUS_EXECUTION_FAILED";
case CUBLAS_STATUS_INTERNAL_ERROR:
return "CUBLAS_STATUS_INTERNAL_ERROR";
#if CUDA_VERSION >= 6000
case CUBLAS_STATUS_NOT_SUPPORTED:
return "CUBLAS_STATUS_NOT_SUPPORTED";
#endif
#if CUDA_VERSION >= 6050
case CUBLAS_STATUS_LICENSE_ERROR:
return "CUBLAS_STATUS_LICENSE_ERROR";
#endif
}
return "Unknown cublas status";
}
Caffe::curandGetErrorString
const char* curandGetErrorString(curandStatus_t error) {
switch (error) {
case CURAND_STATUS_SUCCESS:
return "CURAND_STATUS_SUCCESS";
case CURAND_STATUS_VERSION_MISMATCH:
return "CURAND_STATUS_VERSION_MISMATCH";
case CURAND_STATUS_NOT_INITIALIZED:
return "CURAND_STATUS_NOT_INITIALIZED";
case CURAND_STATUS_ALLOCATION_FAILED:
return "CURAND_STATUS_ALLOCATION_FAILED";
case CURAND_STATUS_TYPE_ERROR:
return "CURAND_STATUS_TYPE_ERROR";
case CURAND_STATUS_OUT_OF_RANGE:
return "CURAND_STATUS_OUT_OF_RANGE";
case CURAND_STATUS_LENGTH_NOT_MULTIPLE:
return "CURAND_STATUS_LENGTH_NOT_MULTIPLE";
case CURAND_STATUS_DOUBLE_PRECISION_REQUIRED:
return "CURAND_STATUS_DOUBLE_PRECISION_REQUIRED";
case CURAND_STATUS_LAUNCH_FAILURE:
return "CURAND_STATUS_LAUNCH_FAILURE";
case CURAND_STATUS_PREEXISTING_FAILURE:
return "CURAND_STATUS_PREEXISTING_FAILURE";
case CURAND_STATUS_INITIALIZATION_FAILED:
return "CURAND_STATUS_INITIALIZATION_FAILED";
case CURAND_STATUS_ARCH_MISMATCH:
return "CURAND_STATUS_ARCH_MISMATCH";
case CURAND_STATUS_INTERNAL_ERROR:
return "CURAND_STATUS_INTERNAL_ERROR";
}
return "Unknown curand status";
}