1. 程式人生 > >Jupyter Notebook Tricks for Data Science that Enhance your efficiency

Jupyter Notebook Tricks for Data Science that Enhance your efficiency

As I am taking the great fast.ai course about deep learning, I learn a lot of neat things that can be applied to general software engineering. I am writing this article to summarize these skills and share with you (for myself).

1. Jupyter Notebook Extension

The standard Jupyter notebook is nice, but there are more extensions that people build and bring a lot of function together which can help your work.

Install Jupyter extension package

# Install Jupyterextension packagepip install jupyter_contrib_nbextensionsjupyter contrib nbextension install — user
# Install configurator and enable configuratorpip install jupyter_nbextensions_configuratorjupyter nbextensions_configurator enable
# Install themepip install jupyterthemes
## Change theme (This is my default)''' Note that you need to use 1 command to configure the setting, if you do 2 jt command, the second one will replace the first one.'''
jt -t grade3 -T

You can find more Jupyter theme here. After you install the Configurator, you can see there is a new “Nbextensions” tab. Get these items ticked

  1. Autopep8
  2. Collapsible Headings
  3. Gist-it

A. Collapsible Headings

You can now collapse your notebook instead of scrolling for endless code. From my experience, I write a lot of dirty code when doing exploratory data analysis and plotting charts and I have to scroll very hard to get to my destination. You can just expand the cell or collapse it to make things clearer. And I think you can even do a Table of Content ( I haven’t try that extension yet).

B. Gist-it

You will see this little Github icon, just click it and you will get your Gist published

If you have used Gist before, it is basically a place that allow you to share your notebook. This is very useful when you want to share your code, especially if when you have bug and you want to share it. Just click the button and everything will be done in a few seconds.

By default, it will publish an anonymous Gist, if you want to publish it with your Github account, you need to generate a token for authentication. The major difference is you can edit your Gist if you publish it with your own account.

C. Autopep8

You can either use this little button or use short-key, up to you!

Styling is important, but it is also boring. If you don’t want to hit your spacebar too hard, just click the little button and it will do all the spacing for you! (PEP 8 is a Style Guide for Python Code)

2. Time your task and profile it !

I have been doing declare start time before the loop and subtract the current time to get the run time. This is not wrong but you can make it easier. Use the built-in magic command. They may look unnatural to you, but it is really handy to use. (magic command start with %)

Lets start with a simple function. It calculate the last fibonacci number that smaller than n.

You can use %time to time for a single run or %timeit to time it a lot of time and get the average and standard deviation. So this is useful for these simple function, how about function that calling other another function?

Here comes %prun, I create a dummy function that call fib1() a lot of time. You can see the loop did spend some time but most of the time is spent on fib1().

3. Cython

Cython is a package that allow you to compile C in python, which is the major reason why numpy and pandas are fast. Make sure you has Cython install by doing

pip install cython

You get instant double performance without changing any code at all! It’s great, but it isn’t amazing at all.

See what you can achieve if you change the script a little bit. If you have some experience for C Programming, you probably know that we need to define a datatype when we declare a variable. The script did change a little bit as operation like this is unique to Python, C doesn’t come with this features. So we need to assign a temp variable to store the value.

a,b = b,a

(Thanks for James Martini point out some of the error in fib3 earlier!)

From 582 to 48 ns, 10 times faster and you actually don’t need to change a lot of script. To me, it is exciting as most of the time you are OK with slow code. What you care is the one that getting called again and again. With %prun and some Cython code, you get C speed without compiling any file.

Apart from magic command, I found doing shell command in Jupyter is also very helpful. (Magic command start with % and shell command start will !)

TBC