1. 程式人生 > >Writing Beautiful Code with NumPy

Writing Beautiful Code with NumPy

Writing beautiful code with NumPy

NumPy is a powerful Python library that can greatly increase the speed and efficiency of processing large data sets. Several data science and machine learning frameworks work in conjunction with or are built on top of NumPy.

To demonstrate the benefits of using NumPy, we’ll look some examples of common data science tasks, accomplished first with Python lists and operators followed by solutions using NumPy.

Python Lists vs. NumPy ndarrays

Python lists are 0-indexed arrays that can contain data of mixed types. Data held in Python lists can be accessed individually by indexing or by iteration through the list:

my_list = ["one", "two", 3]
#individual indexing - prints 'one'print(my_list[0])
#iteration - prints 'one, two, 3'for el in enumerate(my_list):   print(el, ', ')

“Lists” in NumPy, called ndarrays (short for n-dimensional arrays), are similar to Python lists at the most basic level, but provide vectorized operation functionality that a Python list cannot.

Let’s say we have created a collection of integers, my_int_list = [1, 2, 3, 4], and we want to alter each element of the collection by multiplying each integer by 2.

Here’s how you might accomplish this with a Python list:

for el in enumerate(my_int_list):   el = el * 2

This will execute quickly on our 3-item list, but the loop sequentially visits and alters each item in the array, which gets computationally expensive once we start working with large data sets. Worse still, when data sets are organized into multiple-dimensional lists, iterating through multi-dimensional lists requires nested loops, decreasing performance exponentially.

Let’s perform the same operation on a NumPy array:

import numpy as np
#create numpy array from original python listmy_numpy_arr = np.array(my_int_list)
#multiply each element by 2my_numpy_arr * 2

This is a more intuitive way to accomplish our task, resulting in faster execution and more concise code.

Matrix Multiplication

Data science and machine learning rely heavily on matrix arithmetic, but data are not always neatly organized in a way that allows for element-to-element operations.

If you aren’t familiar with matrix multiplication, here’s the order of operations for multiplying a 2x4 and 4x3 matrix:

Procedure for finding row 1 of result matrix
Procedure for finding row 2 of result matrix

There’s nothing mathematically complex about matrix multiplication, but it’s clear that it requires lots of movement through each array.