python __slots__ 使你的程式碼更加節省記憶體
因此這種預設的做法可以通過在新式類中定義了一個__slots__屬性從而得到了解決。__slots__宣告中包含若干例項變數,併為每個例項預留恰好足夠的空間來儲存每個變數,因此沒有為每個例項都建立一個字典,從而節省空間。
現在來說說python中dict為什麼比list浪費記憶體?
和list相比,dict 查詢和插入的速度極快,不會隨著key的增加而增加;dict需要佔用大量的記憶體,記憶體浪費多。
而list查詢和插入的時間隨著元素的增加而增加;佔用空間小,浪費的記憶體很少。
python直譯器是Cpython,這兩個資料結構應該對應C的雜湊表和陣列。因為雜湊表需要額外記憶體記錄對映關係,而陣列只需要通過索引就能計算出下一個節點的位置,所以雜湊表佔用的記憶體比陣列大,也就是dict比list佔用的記憶體更大。
如下程式碼是我從python官方擷取的程式碼片段:
List 原始碼:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
typedef struct {
PyObject_VAR_HEAD
/ * Vector of pointers to list elements. list [ 0 ] is ob_item[ 0 ], etc. * /
PyObject * * ob_item;
/ * ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 < = ob_size < = allocated
* len ( list ) = = ob_size
* ob_item = = NULL implies ob_size = = allocated = = 0
* list .sort() temporarily sets allocated to - 1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
* /
Py_ssize_t allocated;
} PyListObject;
|
Dict原始碼:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
/ * PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object ( in the ma_smalltable member).
* It must be a power of 2 , and at least 4. 8 allows dicts with no more
* than 5 active entries to live in ma_smalltable ( and so avoid an
* additional malloc); instrumentation suggested this suffices for the
* majority of dicts (consisting mostly of usually - small instance dicts and
* usually - small dicts created to pass keyword arguments).
* /
#define PyDict_MINSIZE 8
typedef struct {
/ * Cached hash code of me_key. Note that hash codes are C longs.
* We have to use Py_ssize_t instead because dict_popitem() abuses
* me_hash to hold a search finger.
* /
Py_ssize_t me_hash;
PyObject * me_key;
PyObject * me_value;
} PyDictEntry;
/ *
To ensure the lookup algorithm terminates, there must be at least one Unused
slot (NULL key) in the table.
The value ma_fill is the number of non - NULL keys ( sum of Active and Dummy);
ma_used is the number of non - NULL, non - dummy keys ( = = the number of non - NULL
values = = the number of Active items).
To avoid slowing down lookups on a near - full table, we resize the table when
it's two - thirds full.
* /
typedef struct _dictobject PyDictObject;
struct _dictobject {
PyObject_HEAD
Py_ssize_t ma_fill; / * # Active + # Dummy */
Py_ssize_t ma_used; / * # Active */
/ * The table contains ma_mask + 1 slots, and that's a power of 2.
* We store the mask instead of the size because the mask is more
* frequently needed.
* /
Py_ssize_t ma_mask;
/ * ma_table points to ma_smalltable for small tables, else to
* additional malloc'ed memory. ma_table is never NULL! This rule
* saves repeated runtime null - tests in the workhorse getitem and
* setitem calls.
* /
PyDictEntry * ma_table;
PyDictEntry * ( * ma_lookup)(PyDictObject * mp, PyObject * key, long hash );
PyDictEntry ma_smalltable[PyDict_MINSIZE];
};
|
PyObject_HEAD 原始碼:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#ifdef Py_TRACE_REFS
/ * Define pointers to support a doubly - linked list of all live heap objects. * /
#define _PyObject_HEAD_EXTRA \
struct _object * _ob_next; \
struct _object * _ob_prev;
#define _PyObject_EXTRA_INIT 0, 0,
#else
#define _PyObject_HEAD_EXTRA
#define _PyObject_EXTRA_INIT
#endif
/ * PyObject_HEAD defines the initial segment of every PyObject. * /
#define PyObject_HEAD \
_PyObject_HEAD_EXTRA \
Py_ssize_t ob_refcnt; \
struct _typeobject * ob_type;
|
PyObject_VAR_HEAD 原始碼:
1 2 3 4 5 6 7 8 9 |
/ * PyObject_VAR_HEAD defines the initial segment of all variable - size
* container objects. These end with a declaration of an array with 1
* element, but enough space is malloc'ed so that the array actually
* has room for ob_size elements. Note that ob_size is an element count,
* not necessarily a byte count.
* /
#define PyObject_VAR_HEAD \
PyObject_HEAD \
Py_ssize_t ob_size; / * Number of items in variable part * /
|
現在知道了dict為什麼比list 佔用的記憶體空間更大。接下來如何讓你的類更加的節省記憶體。
其實有兩種解決方案:
第一種是使用__slots__ ;另外一種是使用Collection.namedtuple 實現。
首先用標準的方式寫一個類:
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/usr/bin/env python
class Foobar( object ):
def __init__( self , x):
self .x = x
@profile
def main():
f = [Foobar( 42 ) for i in range ( 1000000 )]
if __name__ = = "__main__" :
main()
|
然後,建立一個類Foobar(),然後例項化100W次。通過@profile檢視記憶體使用情況。
執行結果:
該程式碼共使用了372M記憶體。
接下來通過__slots__程式碼實現該程式碼:
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/usr/bin/env python
class Foobar( object ):
__slots__ = 'x'
def __init__( self , x):
self .x = x
@profile
def main():
f = [Foobar( 42 ) for i in range ( 1000000 )]
if __name__ = = "__main__" :
main()
|
執行結果:
使用__slots__使用了91M記憶體,比使用__dict__儲存屬性值節省了4倍。
其實使用collection模組的namedtuple也可以實現__slots__相同的功能。namedtuple其實就是繼承自tuple,同時也因為__slots__的值被設定成了一個空tuple以避免建立__dict__。
看看collection是如何實現的:
collection 和普通建立類方式相比,也節省了不少的記憶體。所在在確定類的屬性值固定的情況下,可以使用__slots__方式對記憶體進行優化。但是這項技術不應該被濫用於靜態類或者其他類似場合,那不是python程式的精神所在。