While dealing with task that usually take a long time to process, streaming data, etc, serialization and de-serialization comes handy. Recently when applying deep learning for MINST dataset on laptop, this becomes a very useful operation.
What is serialization?
storage, serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment). The opposite process is called: deserialization (also called unmashalling).
In python, this can be easily implemented by using pickle module.
When to use Pickle?
Here are some common usage for this process:
1) saving a program’s state data to disk so that it can carry on where it left off when restarted (persistence)
2) sending python data over a TCP connection in a multi-core or distributed system (marshalling)
3) storing python objects in a database
4) converting an arbitrary python object to a string so that it can be used as a dictionary key (e.g. for caching & memoization).
There are some issues with the last one – two identical objects can be pickled and result in different strings – or even the same object pickled twice can have different representations. This is because the pickle can include reference count information.
How to use Pickle?
Saving:
import pickle
with (open(‘save.p’,’wb’) as f:
pickle.dump(myStuff, f)
Loading:
try:
with open(‘save.p’,’rb’) as f:
myStuff = pickle.load(f)
except:
myStuff = defaultdict(dict)alternatively:
myStuff = pickle.load(open(‘save.p’,’rb’))
Please note that, the argument ‘rb‘ is necessary while loading the pickled data.
Alternatives method
Using dill to pickle anything. Link: http://nbviewer.jupyter.org/gist/minrk/5241793