Base64編碼原來是這麼回事兒
更多python教程請到: 菜鳥教程www.piaodoo.com
人人影視www.sfkyty.com
1. 基本介紹
tensorflow裝置記憶體管理模組實現了一個best-fit with coalescing演算法(後文簡稱bfc演算法)。
bfc演算法是Doung Lea's malloc(dlmalloc)的一個非常簡單的版本。
它具有記憶體分配、釋放、碎片管理等基本功能。
2. bfc基本演算法思想
1. 資料結構
整個記憶體空間由一個按基址升序排列的Chunk雙向連結串列來表示,它們的直接前趨和後繼必須在地址連續的記憶體空間。Chunk結構體裡含有實際大小、請求大小、是否被佔用、基址、直接前趨、直接後繼、Bin索引等資訊。
2. 申請
使用者申請一個記憶體塊(malloc)。根據chunk雙鏈表找到一個合適的記憶體塊,如果該記憶體塊的大小是使用者申請的大小的二倍以上,那麼就將該記憶體塊切分成兩塊,這就是split操作。
返回其中一塊給使用者,並將該記憶體塊標識為佔用
Spilt操作會新增一個chunk,所以需要修改chunk雙鏈表以維持前驅和後繼關係
如果使用者申請512的空間,正好有一塊1024的chunk2是空閒的,由於1024/512 =2,所以chunk2 被split為2塊:chunk2_1和chunk2_2。返回chunk2_1給使用者並將其標誌位佔用狀態。
3. 釋放
使用者釋放一個記憶體塊(free)。先將該塊標記為空閒。然後根據chunk資料結構中的資訊找到其前驅和後繼記憶體塊。如果前驅和後繼塊中有空閒的塊,那麼將剛釋放的塊和空閒的塊合併成一個更大的chunk(這就是merge操作,合併當前塊和其前後的空閒塊)。再修改雙鏈表結構以維持前驅後繼關係。這就做到了記憶體碎片的回收。
如果使用者要free chunk3,由於chunk3的前驅chunk2也是空閒的,所以將chunk2和chunk3合併得到一個新的chunk2',大小為chunk2和chunk3之和。
3. bins
1. bins資料結構
bfc演算法採取的是被動分塊的策略。最開始整個記憶體是一個chunk,隨著使用者申請空間的次數增加,最開始的大chunk會被不斷的split開來,從而產生越來越多的小chunk。當chunk數量很大時,為了尋找一個合適的記憶體塊而遍歷雙鏈表無疑是一筆巨大的開銷。為了實現對空閒塊的高效管理,bfc演算法設計了bin這個抽象資料結構。
每個bin都有一個size屬性,一個bin是一個擁有chunk size >= binsize的空閒chunk的集合。集合中的chunk按照chunk size的升序組織成單鏈表。bfc演算法維護了一個bin的集合:bins。它由多個bin以及從屬於每個bin的chunks組成。記憶體中所有的空閒chunk都由bins管理。
圖中每一列表示一個bin,列首方格中的數字表示bin的size。bin size的大小都是256的2^n的倍。每個bin下面掛載了一系列的空閒chunk,每個chunk的chunk size都大於等於所屬的bin的bin size,按照chunk size的升序掛載成單鏈表。
2. bins操作
bfc演算法針對bins這個集合設計了三個操作:search、insert、delete。
search
給定一個chunk size,從bins中找到大於等於該chunksize的最小的那個空閒chunk。Search操作具體流程如下。如果bin以陣列的形式組織,那麼可以從index = chunk size /256 >>2的那個bin開始查詢。最好的情況是開始查詢的那個bin的chunk連結串列非空,那麼直接返回連結串列頭即可。這種情況時間複雜度是常數級的。最壞的情況是遍歷bins陣列中所有的bin。對於一般大小的記憶體來說,bins陣列元素非常少,比如4G空間只需要23個bin就足夠了(256 * 2 ^ 23 > 4G),因此也很快能返回結果。總體來說search操作是非常高效的。對於固定大小記憶體來說,查詢時間是常數量級的。
insert
將一個空閒的chunk插入到一個bin所掛載的chunk連結串列中,同時需要維持chunk連結串列的升序關係。具體流程是直接將chunk插入到index = chunk size /256 >>2的那個bin中即可。
delete
將一個空閒的chunk從bins中移除。
4. 總結
將記憶體分塊管理,按塊進行空間分配和釋放。
通過split操作將大記憶體塊分解成使用者需要的小記憶體塊。
通過merge操作合併小的記憶體塊,做到記憶體碎片回收
通過bin這個抽象資料結構實現對空閒塊高效管理。
5. 程式碼分析
1. 程式碼地址
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime
2. 資料結構
Chunk
static const int kInvalidChunkHandle = -1; ... struct Chunk { size_t size = 0; // Full size of buffer.// We sometimes give chunks that are larger than needed to reduce
// fragmentation. requested_size keeps track of what the client
// actually wanted so we can understand whether our splitting
// strategy is efficient.
size_t requested_size = 0;// allocation_id is set to -1 when the chunk is not in use. It is assigned a
// value greater than zero before the chunk is returned from
// AllocateRaw, and this value is unique among values assigned by
// the parent allocator.
int64 allocation_id = -1;
void* ptr = nullptr; // pointer to granted subbuffer.// If not kInvalidChunkHandle, the memory referred to by 'prev' is directly
// preceding the memory used by this chunk. E.g., It should start
// at 'ptr - prev->size'
ChunkHandle prev = kInvalidChunkHandle;// If not kInvalidChunkHandle, the memory referred to by 'next' is directly
// following the memory used by this chunk. E.g., It should be at
// 'ptr + size'
ChunkHandle next = kInvalidChunkHandle;// What bin are we in?
BinNum bin_num = kInvalidBinNum;bool in_use() const { return allocation_id != -1; }
};
Bin
// A Bin is a collection of similar-sized free chunks. struct Bin { // All chunks in this bin have >= bin_size memory. size_t bin_size = 0;struct ChunkComparator {
explicit ChunkComparator(BFCAllocator* allocator)
: allocator_(allocator) {}
// Sort first by size and then use pointer address as a tie breaker.
bool operator()(const ChunkHandle ha,
const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS {
const Chunk* a = allocator_->ChunkFromHandle(ha);
const Chunk* b = allocator_->ChunkFromHandle(hb);
if (a->size != b->size) {
return a->size < b->size;
}
return a->ptr < b->ptr;
}private: BFCAllocator* allocator_; // The parent allocator
};
typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet;
// List of free chunks within the bin, sorted by chunk size.
// Chunk * not owned.
FreeChunkSet free_chunks;
Bin(BFCAllocator* allocator, size_t bs)
: bin_size(bs), free_chunks(ChunkComparator(allocator)) {}
};
AllocationRegion
AllocationRegion給一個連續的記憶體區域做指標到ChunkHandle的對映。
RegionManager
RegionManager聚集了一個或多個AllocationRegion,並提供一個從指標到基礎ChunkHandle的間接層,這個間接層可在多個不連續的記憶體區域進行分配。
3. 分配大小
將每次分配的記憶體大小調整為kMinAllocationSize的N倍,這樣所有記憶體地址都是很好地按位元組對齊了。
// kMinAllocationSize = 256 static const size_t kMinAllocationBits = 8; static const size_t kMinAllocationSize = 1 << kMinAllocationBits; ... size_t BFCAllocator::RoundedBytes(size_t bytes) { size_t rounded_bytes = (kMinAllocationSize * ((bytes + kMinAllocationSize - 1) / kMinAllocationSize)); DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize); return rounded_bytes; }
4. 初始化bin
typedef int BinNum; static const int kInvalidBinNum = -1; static const int kNumBins = 21; ... // 二進位制2^8往左移0,1,2位 // (static_cast<size_t>(256) << 0) = 256 // (static_cast<size_t>(256) << 1) = 512 // (static_cast<size_t>(256) << 2) = 1024 size_t BinNumToSize(BinNum index) { return static_cast<size_t>(256) << index; } ... char bins_space_[sizeof(Bin) * kNumBins]; // Map from bin size to Bin Bin* BinFromIndex(BinNum index) { return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)])); } ... // We create bins to fit all possible ranges that cover the // memory_limit_ starting from allocations up to 256 bytes to // allocations up to (and including) the memory limit. for (BinNum b = 0; b < kNumBins; b++) { size_t bin_size = BinNumToSize(b); VLOG(1) << "Creating bin of max chunk size " << strings::HumanReadableNumBytes(bin_size); new (BinFromIndex(b)) Bin(this, bin_size); CHECK_EQ(BinForSize(bin_size), BinFromIndex(b)); CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b)); CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b)); if (b + 1 < kNumBins) { CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b)); } }
5. 查詢bin
// 求屬於第幾個bin BinNum BinNumForSize(size_t bytes) { uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits; int b = std::min(kNumBins - 1, Log2FloorNonZero(v)); return b; } // 最高位非零的二進位制位數,eg: 0001 0101B 為5 inline int Log2FloorNonZero(uint64 n) { #if defined(__GNUC__) return 63 ^ __builtin_clzll(n); #elif defined(PLATFORM_WINDOWS) unsigned long index; _BitScanReverse64(&index, n); return index; #else int r = 0; while (n > 0) { r++; n >>= 1; } return r; #endif }
6. 查詢Chunk
// 先加鎖 mutex_lock l(lock_); void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes); if (ptr != nullptr) { return ptr; } // FindChunkPtr函式內部 void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes, size_t num_bytes) { // First identify the first bin that could satisfy rounded_bytes. for (; bin_num < kNumBins; bin_num++) { // Start searching from the first bin for the smallest chunk that fits // rounded_bytes. Bin* b = BinFromIndex(bin_num); for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end(); ++citer) { // 從之前得到的Bin索引開始,查詢合適的空閒Chunk: const BFCAllocator::ChunkHandle h = (*citer); BFCAllocator::Chunk* chunk = ChunkFromHandle(h); DCHECK(!chunk->in_use()); if (chunk->size >= rounded_bytes) { // We found an existing chunk that fits us that wasn't in use, so remove // it from the free bin structure prior to using. RemoveFreeChunkIterFromBin(&b->free_chunks, citer);// If we can break the size of the chunk into two reasonably // large pieces, do so. // // TODO(vrv): What should be the criteria when deciding when // to split? // 具體實現後面會分析 if (chunk->size >= rounded_bytes * 2) { SplitChunk(h, rounded_bytes); chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved } // The requested size of the returned chunk is what the user // has allocated. chunk->requested_size = num_bytes; // Assign a unique id and increment the id counter, marking the // chunk as being in use. chunk->allocation_id = next_allocation_id_++; // Update stats. ++stats_.num_allocs; stats_.bytes_in_use += chunk->size; stats_.max_bytes_in_use = std::max(stats_.max_bytes_in_use, stats_.bytes_in_use); stats_.max_alloc_size = std::max<std::size_t>(stats_.max_alloc_size, chunk->size); VLOG(4) << "Returning: " << chunk->ptr; if (VLOG_IS_ON(4)) { LOG(INFO) << "A: " << RenderOccupancy(); } return chunk->ptr; } }
}
return nullptr;
}
7. 拆分Chunk
如果Chunk的大小大於等於申請記憶體大小的2倍,那麼將該Chunk拆分成2個:第一個Chunk的大小等於申請記憶體大小,第二個Chunk作為它的直接後繼。
if (chunk->size >= rounded_bytes * 2) { SplitChunk(h, rounded_bytes); chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved }void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) {
// Allocate the new chunk before we do any ChunkFromHandle
ChunkHandle h_new_chunk = AllocateChunk();Chunk* c = ChunkFromHandle(h);
CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum));// Create a new chunk starting num_bytes after c
BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk);
new_chunk->ptr = static_cast<void>(static_cast<char>(c->ptr) + num_bytes);
region_manager_.set_handle(new_chunk->ptr, h_new_chunk);// Set the new sizes of the chunks.
new_chunk->size = c->size - num_bytes;
c->size = num_bytes;// The new chunk is not in use.
new_chunk->allocation_id = -1;// Maintain the pointers.
// c <-> c_neighbor becomes
// c <-> new_chunk <-> c_neighbor
BFCAllocator::ChunkHandle h_neighbor = c->next;
new_chunk->prev = h;
new_chunk->next = h_neighbor;
c->next = h_new_chunk;
if (h_neighbor != kInvalidChunkHandle) {
Chunk* c_neighbor = ChunkFromHandle(h_neighbor);
c_neighbor->prev = h_new_chunk;
}// Add the newly free chunk to the free bin.
InsertFreeChunkIntoBin(h_new_chunk);
}
8. 回收chunk
加鎖,獲得ChunkHandle
mutex_lock l(lock_); BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr); FreeAndMaybeCoalesce(h);
FreeAndMaybeCoalesce
void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) { Chunk* c = ChunkFromHandle(h); CHECK(c->in_use() && (c->bin_num == kInvalidBinNum));// Mark the chunk as no longer in use
c->allocation_id = -1;// Updates the stats.
stats_.bytes_in_use -= c->size;// This chunk is no longer in-use, consider coalescing the chunk
// with adjacent chunks.
ChunkHandle chunk_to_reassign = h;// If the next chunk is free, coalesce the two
if (c->next != kInvalidChunkHandle) {
Chunk* cnext = ChunkFromHandle(c->next);
if (!cnext->in_use()) {
// VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " <<
// c->ptr;chunk_to_reassign = h; // Deletes c->next RemoveFreeChunkFromBin(c->next); Merge(h, ChunkFromHandle(h)->next); }
}
// If the previous chunk is free, coalesce the two
c = ChunkFromHandle(h);
if (c->prev != kInvalidChunkHandle) {
Chunk* cprev = ChunkFromHandle(c->prev);
if (!cprev->in_use()) {
// VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
// << cprev->ptr;chunk_to_reassign = c->prev; // Deletes c RemoveFreeChunkFromBin(c->prev); Merge(ChunkFromHandle(h)->prev, h); c = ChunkFromHandle(h); }
}
InsertFreeChunkIntoBin(chunk_to_reassign);
}
Merge
// Merges h1 and h2 when Chunk(h1)->next is h2 and Chunk(h2)->prev is c1. // We merge Chunk(h2) into Chunk(h1). void BFCAllocator::Merge(BFCAllocator::ChunkHandle h1, BFCAllocator::ChunkHandle h2) { Chunk* c1 = ChunkFromHandle(h1); Chunk* c2 = ChunkFromHandle(h2); // We can only merge chunks that are not in use. CHECK(!c1->in_use() && !c2->in_use());// c1's prev doesn't change, still points to the same ptr, and is
// still not in use.// Fix up neighbor pointers
//
// c1 <-> c2 <-> c3 should become
// c1 <-> c3BFCAllocator::ChunkHandle h3 = c2->next;
c1->next = h3;
CHECK(c2->prev == h1);
if (h3 != kInvalidChunkHandle) {
BFCAllocator::Chunk* c3 = ChunkFromHandle(h3);
c3->prev = h1;
}// Set the new size
c1->size += c2->size;DeleteChunk(h2);
}
以上這篇TensorFlow記憶體管理bfc演算法例項就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支援菜鳥教程www.piaodoo.com。