python slots 使你的程式碼更加節省記憶體

阿新 • • 發佈：2018-12-30

在預設情況下,Python的新類和舊類的例項都有一個字典來儲存屬性值。這對於那些沒有例項屬性的物件來說太浪費空間了，當需要建立大量例項的時候，這個問題變得尤為突出。

因此這種預設的做法可以通過在新式類中定義了一個__slots__屬性從而得到了解決。__slots__宣告中包含若干例項變數，併為每個例項預留恰好足夠的空間來儲存每個變數，因此沒有為每個例項都建立一個字典，從而節省空間。

現在來說說python中dict為什麼比list浪費記憶體？

和list相比，dict 查詢和插入的速度極快，不會隨著key的增加而增加；dict需要佔用大量的記憶體，記憶體浪費多。

而list查詢和插入的時間隨著元素的增加而增加；佔用空間小，浪費的記憶體很少。

python直譯器是Cpython，這兩個資料結構應該對應C的雜湊表和陣列。因為雜湊表需要額外記憶體記錄對映關係，而陣列只需要通過索引就能計算出下一個節點的位置，所以雜湊表佔用的記憶體比陣列大，也就是dict比list佔用的記憶體更大。

如下程式碼是我從python官方擷取的程式碼片段：

List 原始碼：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

typedef struct { PyObject_VAR_HEAD /* Vector of pointers to

list elements. list[0] is ob_item[0], etc. */ PyObject **ob_item; /* ob_item contains space for 'allocated' elements. The number * currently in use is ob_size. * Invariants: * 0 <= ob_size <= allocated * len(list) == ob_size * ob_item == NULL implies ob_size == allocated == 0 * list.sort() temporarily sets allocated to

-1 to detect mutations. * * Items must normally not be NULL, except during construction when * the list is not yet visible outside the function that builds it. */ Py_ssize_t allocated; } PyListObject;

Dict原始碼：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 /* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are * allocated directly in the dict object (in the ma_smalltable member). * It must be a power of 2, and at least 4. 8 allows dicts with no more * than 5 active entries to live in ma_smalltable (and so avoid an * additional malloc); instrumentation suggested this suffices for the * majority of dicts (consisting mostly of usually-small instance dicts and * usually-small dicts created to pass keyword arguments). */ #define PyDict_MINSIZE 8 typedef struct { /* Cached hash code of me_key. Note that hash codes are C longs. * We have to use Py_ssize_t instead because dict_popitem() abuses * me_hash to hold a search finger. */ Py_ssize_t me_hash; PyObject *me_key; PyObject *me_value; } PyDictEntry; /* To ensure the lookup algorithm terminates, there must be at least one Unused slot (NULL key) in the table. The value ma_fill is the number of non-NULL keys (sum of Active and Dummy); ma_used is the number of non-NULL, non-dummy keys (== the number of non-NULL values == the number of Active items). To avoid slowing down lookups on a near-full table, we resize the table when it's two-thirds full. */ typedef struct _dictobject PyDictObject; struct _dictobject { PyObject_HEAD Py_ssize_t ma_fill; /* # Active + # Dummy */ Py_ssize_t ma_used; /* # Active */ /* The table contains ma_mask + 1 slots, and that's a power of 2. * We store the mask instead of the size because the mask is more * frequently needed. */ Py_ssize_t ma_mask; /* ma_table points to ma_smalltable for small tables, else to * additional malloc'ed memory. ma_table is never NULL! This rule * saves repeated runtime null-tests in the workhorse getitem and * setitem calls. */ PyDictEntry *ma_table; PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash); PyDictEntry ma_smalltable[PyDict_MINSIZE]; };

PyObject_HEAD 原始碼:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #ifdef Py_TRACE_REFS /* Define pointers to support a doubly-linked list of all live heap objects. */ #define _PyObject_HEAD_EXTRA \ struct _object *_ob_next; \ struct _object *_ob_prev; #define _PyObject_EXTRA_INIT 0, 0, #else #define _PyObject_HEAD_EXTRA #define _PyObject_EXTRA_INIT #endif /* PyObject_HEAD defines the initial segment of every PyObject. */ #define PyObject_HEAD \ _PyObject_HEAD_EXTRA \ Py_ssize_t ob_refcnt; \ struct _typeobject *ob_type;

PyObject_VAR_HEAD 原始碼:

1 2 3 4 5 6 7 8 9 /* PyObject_VAR_HEAD defines the initial segment of all variable-size * container objects. These end with a declaration of an array with 1 * element, but enough space is malloc'ed so that the array actually * has room for ob_size elements. Note that ob_size is an element count, * not necessarily a byte count. */ #define PyObject_VAR_HEAD \ PyObject_HEAD \ Py_ssize_t ob_size; /* Number of items in variable part */

現在知道了dict為什麼比list 佔用的記憶體空間更大。接下來如何讓你的類更加的節省記憶體。

其實有兩種解決方案：

第一種是使用__slots__ ；另外一種是使用Collection.namedtuple 實現。

首先用標準的方式寫一個類：

1 2 3 4 5 6 7 8 9 10 11 12 #!/usr/bin/env python class Foobar(object): def __init__(self, x): self.x = x @profile def main(): f = [Foobar(42) for i in range(1000000)] if __name__ == "__main__": main()

然後，建立一個類Foobar()，然後例項化100W次。通過@profile檢視記憶體使用情況。

執行結果：

該程式碼共使用了372M記憶體。

接下來通過__slots__程式碼實現該程式碼：

1 2 3 4 5 6 7 8 9 10 11 12 #!/usr/bin/env python class Foobar(object): __slots__ = 'x' def __init__(self, x): self.x = x @profile def main(): f = [Foobar(42) for i in range(1000000)] if __name__ == "__main__": main()

執行結果：

使用__slots__使用了91M記憶體，比使用__dict__儲存屬性值節省了4倍。

其實使用collection模組的namedtuple也可以實現__slots__相同的功能。namedtuple其實就是繼承自tuple，同時也因為__slots__的值被設定成了一個空tuple以避免建立__dict__。

看看collection是如何實現的：

collection 和普通建立類方式相比，也節省了不少的記憶體。所在在確定類的屬性值固定的情況下，可以使用__slots__方式對記憶體進行優化。但是這項技術不應該被濫用於靜態類或者其他類似場合，那不是python程式的精神所在。

python slots 使你的程式碼更加節省記憶體

python slots 使你的程式碼更加節省記憶體

python使用pandas處理大資料節省記憶體技巧

這些Markdown編輯器，使你寫作更加便捷

Python（一）讓你的程式碼更加pythonic

python中的當資料量非常大的時候，節省記憶體空間的設定方式------------------------生成器與迭代器詳解，內附示例程式碼

python爬蟲——40行程式碼爬取「筆趣看」全部小說你都看了嗎？

一些巨集替換用法，使程式碼更加精煉。總結了兩個，一個foreach，用來c++容器遍歷，一個計算程式碼執行時間的。

歸併排序，至底向上的方法，不會減少複雜度，但是會使程式碼更加簡單

Python技巧 | 一行程式碼減少一半記憶體佔用

python-24-如何派生類內建不可變型別並修改其例項化行為？如何為建立大量例項節省記憶體？

Python必不可少的小技巧，一行程式碼減少一半記憶體佔用！

Python丨給你的爬蟲程式碼裡面新增一些小功能，讓你的程式碼與眾不同

一文帶你讀懂Cascade R-CNN，一個使你的檢測更加準確的網路

[Python]簡單幾行程式碼帶你完成Python切換代理IP

讓你的Tex程式碼更加美觀就這麼簡單----Tex程式碼的自動格式化

python 使用生成器節省記憶體

Head First Python（分享你的程式碼）

使用memory_profiler監測python程式碼執行時記憶體消耗

太極拳的主要技術，“固鎖核心”，使你的功力更加深入

轉：Cascade R-CNN，一個使你的檢測更加準確的網絡

python __slots__ 使你的程式碼更加節省記憶體

相關推薦

python slots 使你的程式碼更加節省記憶體