1. 程式人生 > 實用技巧 >data persistence of Python

data persistence of Python

data persistence

https://docs.python.org/3.7/library/persistence.html

支援python記憶體中的資料以持久化的形式儲存在磁碟中。

同時支援從磁碟中將資料恢復到記憶體中。

The modules described in this chapter support storing Python data in a persistent form on disk.

The pickle and marshal modules can turn many Python data types into a stream of bytes and then recreate the objects from the bytes.

The various DBM-related modules support a family of hash-based file formats that store a mapping of strings to other strings.

序列化工具 -- pickle

https://docs.python.org/3.7/library/pickle.html

將記憶體中的python物件轉換成二進位制的位元組碼,

或者將二進位制的位元組碼恢復為記憶體中的物件。

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,”

1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

儲存

import pickle

# An arbitrary collection of objects supported by pickle.
data = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}

with open('data.pickle', 'wb') as f:
    # Pickle the 'data' dictionary using the highest protocol available.
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

恢復

import pickle

with open('data.pickle', 'rb') as f:
    # The protocol version used is detected automatically, so we do not
    # have to specify it.
    data = pickle.load(f)

應用層工具 --- shelve

https://docs.python.org/3.7/library/shelve.html

pickle主要是提供序列化和反序列化方法,

至於二進位制位元組碼需要儲存到磁碟中,還需要開發者自己編碼解決。

shelve提供直接的介面,應用只需要關注資料的變更。

於dbm不同,其可以儲存任意型別的python物件, 底層使用pickle進行序列化。

A “shelf” is a persistent, dictionary-like object.

The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle.

This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys are ordinary strings.

import shelve

d = shelve.open(filename)  # open -- file may get suffix added by low-level
                           # library

d[key] = data              # store data at key (overwrites old data if
                           # using an existing key)
data = d[key]              # retrieve a COPY of data at key (raise KeyError
                           # if no such key)
del d[key]                 # delete data stored at key (raises KeyError
                           # if no such key)

flag = key in d            # true if the key exists
klist = list(d.keys())     # a list of all existing keys (slow!)

# as d was opened WITHOUT writeback=True, beware:
d['xx'] = [0, 1, 2]        # this works as expected, but...
d['xx'].append(3)          # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!

# having opened d without writeback=True, you need to code carefully:
temp = d['xx']             # extracts the copy
temp.append(5)             # mutates the copy
d['xx'] = temp             # stores the copy right back, to persist it

# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.

d.close()                  # close it

底層儲存工具 -- dbm

https://docs.python.org/3.7/library/dbm.html

dbm is a generic interface to variants of the DBM database — dbm.gnu or dbm.ndbm. If none of these modules is installed, the slow-but-simple implementation in module dbm.dumb will be used. There is a third party interface to the Oracle Berkeley DB.

https://en.wikipedia.org/wiki/DBM_(computing)

鍵值對資料庫, 早期的NoSQL資料庫, 具有查詢速度快的優點。

因為其使用key的hash值作為索引。

同時也導致了更新速度慢的缺點。

In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system.[1][2][3]

The original dbm library and file format was a simple database engine, originally written by Ken Thompson and released by AT&T in 1979. The name is a three letter acronym for DataBase Manager, and can also refer to the family of database engines with APIs and features derived from the original dbm.

https://pymotw.com/3/dbm/index.html

大概關係如下

shelve --> pickle --> dbm --> dbm database

dbm is a front-end for DBM-style databases that use simple string values as keys to access records containing strings. It uses whichdb() to identify databases, then opens them with the appropriate module. It is used as a back-end for shelve, which stores objects in a DBM database using pickle.

import dbm

# Open database, creating it if necessary.
with dbm.open('cache', 'c') as db:

    # Record some values
    db[b'hello'] = b'there'
    db['www.python.org'] = 'Python Website'
    db['www.cnn.com'] = 'Cable News Network'

    # Note that the keys are considered bytes now.
    assert db[b'www.python.org'] == b'Python Website'
    # Notice how the value is now in bytes.
    assert db['www.cnn.com'] == b'Cable News Network'

    # Often-used methods of the dict interface work too.
    print(db.get('python.org', b'not present'))

    # Storing a non-string key or value will raise an exception (most
    # likely a TypeError).
    db['www.yahoo.com'] = 4

# db is automatically closed when leaving the with statement.

關係資料庫工具 -- sqlite3

https://docs.python.org/3.7/library/sqlite3.html

輕量級磁碟資料庫,不需要獨立的server。

可以使用sql語言。

SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle.

The sqlite3 module was written by Gerhard Häring. It provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249.

import sqlite3

persons = [
    ("Hugo", "Boss"),
    ("Calvin", "Klein")
    ]

con = sqlite3.connect(":memory:")

# Create the table
con.execute("create table person(firstname, lastname)")

# Fill the table
con.executemany("insert into person(firstname, lastname) values (?, ?)", persons)

# Print the table contents
for row in con.execute("select firstname, lastname from person"):
    print(row)

print("I just deleted", con.execute("delete from person").rowcount, "rows")

# close is not a shortcut method and it's not called automatically,
# so the connection object should be closed manually
con.close()

pyc專用程式碼編譯快取工具 --- marshal

其生成的位元組碼具有機器架構相關性。

專門用於pyc檔案快取。

不能用作rpc交換資料, pickle的位元組碼是可以進行機器交換資料的。

This module contains functions that can read and write Python values in a binary format. The format is specific to Python, but independent of machine architecture issues (e.g., you can write a Python value to a file on a PC, transport the file to a Sun, and read it back there). Details of the format are undocumented on purpose; it may change between Python versions (although it rarely does). 1

This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the pickle module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal.