malloc記憶體分配位元組對齊問題
阿新 • • 發佈:2019-01-25
最近看了一些開源的C/C++庫,其中都對於記憶體分配這塊做出了自己的一些優化和說明,也涉及到了一些記憶體分配位元組對齊以及記憶體分頁的問題。
對於記憶體分配的位元組對齊問題,一直都是隻知其事,不知其解,平時也很少關注這一塊會帶來的效能問題。但是要是放在一個高併發,快速以及資源最大化利用的系統裡面,這一塊往往是需要注意的,所以也就趁著這次機會,大概的瞭解一下。
我們先來看一下glibc裡面malloc.c的定義
其中,有很多的巨集定義,我們只看最主要的幾個。request2size負責記憶體對齊操作,MINSIZE是malloc時記憶體佔用的最小記憶體單元,32位系統為16位元組,64位系統為32位元組,MALLOC_ALIGNMENT為記憶體對齊位元組數,由於在32和64位系統中,size_t為4位元組和8位元組,所以MALLOC_ALIGNMENT在32位和64位系統中,分別為8和16.1100 /* 1101 ----------------------- Chunk representations ----------------------- 1102 */ 1103 1104 1105 /* 1106 This struct declaration is misleading (but accurate and necessary). 1107 It declares a "view" into memory allowing access to necessary 1108 fields at known offsets from a given base. See explanation below. 1109 */ 1110 1111 struct malloc_chunk { 1112 1113 INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */ 1114 INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */ 1115 1116 struct malloc_chunk* fd; /* double links -- used only if free. */ 1117 struct malloc_chunk* bk; 1118 1119 /* Only used for large blocks: pointer to next larger size. */ 1120 struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */ 1121 struct malloc_chunk* bk_nextsize; 1122 }; 1123 1124 1125 /* 1126 malloc_chunk details: 1127 1128 (The following includes lightly edited explanations by Colin Plumb.) 1129 1130 Chunks of memory are maintained using a `boundary tag' method as 1131 described in e.g., Knuth or Standish. (See the paper by Paul 1132 Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a 1133 survey of such techniques.) Sizes of free chunks are stored both 1134 in the front of each chunk and at the end. This makes 1135 consolidating fragmented chunks into bigger chunks very fast. The 1136 size fields also hold bits representing whether chunks are free or 1137 in use. 1138 1139 An allocated chunk looks like this: 1140 1141 1142 chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1143 | Size of previous chunk, if allocated | | 1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1145 | Size of chunk, in bytes |M|P| 1146 mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1147 | User data starts here... . 1148 . . 1149 . (malloc_usable_size() bytes) . 1150 . | 1151 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1152 | Size of chunk | 1153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1154 1155 1156 Where "chunk" is the front of the chunk for the purpose of most of 1157 the malloc code, but "mem" is the pointer that is returned to the 1158 user. "Nextchunk" is the beginning of the next contiguous chunk. 1159 1160 Chunks always begin on even word boundaries, so the mem portion 1161 (which is returned to the user) is also on an even word boundary, and 1162 thus at least double-word aligned. 1163 1164 Free chunks are stored in circular doubly-linked lists, and look like this: 1165 1166 chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1167 | Size of previous chunk | 1168 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1169 `head:' | Size of chunk, in bytes |P| 1170 mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1171 | Forward pointer to next chunk in list | 1172 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1173 | Back pointer to previous chunk in list | 1174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1175 | Unused space (may be 0 bytes long) . 1176 . . 1177 . | 1178 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1179 `foot:' | Size of chunk, in bytes | 1180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1181 1182 The P (PREV_INUSE) bit, stored in the unused low-order bit of the 1183 chunk size (which is always a multiple of two words), is an in-use 1184 bit for the *previous* chunk. If that bit is *clear*, then the 1185 word before the current chunk size contains the previous chunk 1186 size, and can be used to find the front of the previous chunk. 1187 The very first chunk allocated always has this bit set, 1188 preventing access to non-existent (or non-owned) memory. If 1189 prev_inuse is set for any given chunk, then you CANNOT determine 1190 the size of the previous chunk, and might even get a memory 1191 addressing fault when trying to do so. 1192 1193 Note that the `foot' of the current chunk is actually represented 1194 as the prev_size of the NEXT chunk. This makes it easier to 1195 deal with alignments etc but can be very confusing when trying 1196 to extend or adapt this code. 1197 1198 The two exceptions to all this are 1199 1200 1. The special chunk `top' doesn't bother using the 1201 trailing size field since there is no next contiguous chunk 1202 that would have to index off it. After initialization, `top' 1203 is forced to always exist. If it would become less than 1204 MINSIZE bytes long, it is replenished. 1205 1206 2. Chunks allocated via mmap, which have the second-lowest-order 1207 bit M (IS_MMAPPED) set in their size fields. Because they are 1208 allocated one-by-one, each must contain its own trailing size field. 1209 1210 */ 1211 1212 /* 1213 ---------- Size and alignment checks and conversions ---------- 1214 */ 1215 1216 /* conversion from malloc headers to user pointers, and back */ 1217 1218 #define chunk2mem(p) ((void*)((char*)(p) + 2*SIZE_SZ)) 1219 #define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ)) 1220 1221 /* The smallest possible chunk */ 1222 #define MIN_CHUNK_SIZE (offsetof(struct malloc_chunk, fd_nextsize)) 1223 1224 /* The smallest size we can malloc is an aligned minimal chunk */ 1225 1226 #define MINSIZE \ 1227 (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)) 1228 1229 /* Check if m has acceptable alignment */ 1230 1231 #define aligned_OK(m) (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0) 1232 1233 #define misaligned_chunk(p) \ 1234 ((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem (p)) \ 1235 & MALLOC_ALIGN_MASK) 1236 1237 1238 /* 1239 Check if a request is so large that it would wrap around zero when 1240 padded and aligned. To simplify some other code, the bound is made 1241 low enough so that adding MINSIZE will also not wrap around zero. 1242 */ 1243 1244 #define REQUEST_OUT_OF_RANGE(req) \ 1245 ((unsigned long) (req) >= \ 1246 (unsigned long) (INTERNAL_SIZE_T) (-2 * MINSIZE)) 1247 1248 /* pad request bytes into a usable size -- internal version */ 1249 1250 #define request2size(req) \ 1251 (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE) ? \ 1252 MINSIZE : \ 1253 ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK) 1254 1255 /* Same, except also perform argument check */ 1256 1257 #define checked_request2size(req, sz) \ 1258 if (REQUEST_OUT_OF_RANGE (req)) { \ 1259 __set_errno (ENOMEM); \ 1260 return 0; \ 1261 } \ 1262 (sz) = request2size (req); 1263
實際上,對齊引數(MALLOC_ALIGNMENT)大小的設定需要滿足以下兩點:
1. 必須是2的冪
2. 必須是void *的整數倍
所以從request2size可知,在64位系統,如果申請記憶體為1~24位元組,系統記憶體消耗32位元組,當申請25位元組的記憶體時,系統記憶體消耗48位元組。而對於32位系統,申請記憶體為1~12位元組時,系統記憶體消耗為16位元組,當申請記憶體為13位元組時,系統記憶體消耗為24位元組。
這裡分享一個別人寫的怎麼實現一個簡單的malloc函式:http://blog.codinglabs.org/articles/a-malloc-tutorial.html