Please do explore beyond the problems given, and feel free to ask questions at any time.
Note: Solutions to some problems are provided, but try to avoid peeking at them until you've made an attempt! Before we start, load NumPy:
import numpy as np
These exercises explore some of the more advanced features of NumPy. If you prefer to learn the basics from scratch, have a look at this introduction.
If you run out of exercises, there are 90 more here!
x
.
Create another array as follows: y = x[2]
.
What happens when you modify y
, does x
also change?x.shape
, x.ndim
,
x.dtype
, x.strides
. What does each property tell you about the array?np.<TAB>
in IPython / Jupyter, print(dir(np))
in standard Python or
np.lookfor('sum')
. Remember that you can always look at the docstring of any
function in Python (use np.func_name?<ENTER>
in IPython or help(np.func_name)
in standard Python).Take a look at the following. Many of you may know it as a meshgrid:
x, y = np.mgrid[:10, :5]
z = x + y
Now, reproduce z from the above snippet, but this time use NumPy's
ogrid
together with broadcasting.
In our solution, broadcasting is used "behind the scenes". To see what happens
more explicitly, apply np.broadcast_arrays
on the x
and y
given by
ogrid
. This should correspond to the x
and y
produced by mgrid
.
Benchmark the two approaches (one using mgrid
, the other ogrid
), using
IPython's %timeit
function, on a larger grid. Can you explain the difference in
execution time?
Given a list of XYZ-coordinates, p
,
[[1.0, 2.0, 10],
[3.0, 4.0, 20],
[5.0, 6.0, 30],
[7.0, 8.0, 40]]
Normalise each coordinate by dividing with its Z (3rd) element. For example, the first row becomes:
[1/10, 2/10, 10/10]
Hint: extract the last column into a variable
z
, and then change its dimensions so thatp / z
works.
Create a 3x3
ndarray. Use fancy indexing to slice out the diagonal elements.
Generate a 10 x 3 array of random numbers (all between 0 and 1). From each row, pick
the number closest to 0.75. Make use of np.abs
and np.argmax
to find the
column j
which contains the closest element in each row.
Predict and verify the shape of the following slicing operation.
x = np.empty((10, 8, 6))
idx0 = np.zeros((3, 8)).astype(int)
idx1 = np.zeros((3, 1)).astype(int)
idx2 = np.zeros((1, 1)).astype(int)
x[idx0, idx1, idx2]
Strictly speaking, this is not a question on indexing, but it's a fun exercise either way.
Construct an array
x = np.arange(12, dtype=np.int32).reshape((3, 4))
so that x
is
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Now, provide to np.lib.stride_tricks.as_strided
the strides necessary to view a
sliding 2x2
window over this array. The output should be
array([[[[ 0, 1],
[ 4, 5]],
[[ 1, 2],
[ 5, 6]],
[[ 2, 3],
[ 6, 7]]],
[[[ 4, 5],
[ 8, 9]],
[[ 5, 6],
[ 9, 10]],
[[ 6, 7],
[10, 11]]]], dtype=int32)
The code is of the form
z = as_strided(x, shape=(2, 3, 2, 2),
strides=(..., ..., ..., ...))
This sort of stride manipulation is handy for implementing techniques such as region based statistics, convolutions, etc.
Design a data-type for storing the following record:
Use it to represent the following data:
dt = np.dtype(<your code here>)
x = np.array([(100, (0, 0.5)),
(200, (0, 10.3)),
(300, (5.5, 15.1))], dtype=dt)
Have a look at the np.dtype
docstring if you need help.
After constructing x
, try to print all the x
values for which time
is greater than 100 (hint: something of the form y[y > 100]
).
Place the following data in a text file, data.txt
:
% rank lemma (10 letters max) frequency dispersion
21 they 1865844 0.96
42 her 969591 0.91
49 as 829018 0.95
7 to 6332195 0.98
63 take 670745 0.97
14 you 3085642 0.92
35 go 1151045 0.93
56 think 772787 0.91
28 not 1638883 0.98
Now, design a suitable structured data type, then load the data from the text
file using np.loadtxt
(look at the docstring to see how to handle
the '%' comment character). Here's a skeleton to start with:
import numpy as np
txtdata = open('data.txt', 'r')
# Construct the data-type
# In IPython type np.dtype?<ENTER> for examples, e.g.,
#
# np.dtype([('x', np.float), ('y', np.int), ('z', np.uint8)])
dt = np.dtype(...) # Modify this line to give the correct answer
data = np.loadtxt(...) # Load data with loadtxt
Examine the data you got, for example:
rank < 30
Sort the data according to frequency (see np.sort).
Save the result to a compressed Numpy data file, sorted.npz
, using
np.savez
and load it back with out = np.load(...)
.
Do you get back what you put in? If not, use dir(out)
or
TAB-completion on out
to figure out how to access the data
(HINT: out
behaves like a dictionary).
An author of a foreign package (included with the exercizes as
problems/mutable_str.py
) provides a string class that
allocates its own memory:
In [1]: from mutable_str import MutableString
In [2]: s = MutableString('abcde')
In [3]: print(s)
abcde
You'd like to view these mutable (mutable means the ability to modify in place) strings as ndarrays, in order to manipulate the underlying memory.
Add an array_interface dictionary attribute to s, then convert s to an
ndarray. Numerically add "2" to the array (use the in-place operator +=
).
Then print the original string to ensure that its value was modified.
Here's a skeleton outline:
import numpy as np
from mutable_str import MutableString
s = MutableString('abcde')
# --- EDIT THIS SECTION ---
# Create an array interface to this foreign object
s.__array_interface__ = {'data' : (XXX, False), # (ptr, is read_only?)
'shape' : XXX,
'typestr' : '|u1', # typecode unsigned character
}
# --- EDIT THIS SECTION ---
print('String before converting to array:', s)
sa = np.asarray(s)
print('String after converting to array:', sa)
sa += 2
print('String after adding "2" to array:', s)
Hint: Documentation for NumPy's
__array_interface__
may be found in the online docs.
Construct the following two arrays:
x = np.array([[1, 2], [3, 4]], order='C', dtype=np.uint8)
y = np.array([[1, 2], [3, 4]], order='F', dtype=np.uint8)
Compare the bytes they store in memory by using
import ctypes
print(list(ctypes.string_at(x.ctypes.data, 4)))
print(list(ctypes.string_at(y.ctypes.data, 4)))
For any array in general, you can do the above using
print(list(ctypes.string_at(x.ctypes.data, x.size * x.itemsize)))
Note that, even though these arrays store data in different memory order, they are identical from the user's perspective.
print(x)
print(y)
Examine the bytes stored by the following array:
x = np.array([[1, 2], [3, 4]], dtype=np.uint32)
Note that, on most laptops, the byte order will be little Endian, i.e. least significant byte first.
Consider a set of p matrices, each with shape (n, n)
, and a set of p vectors,
each with shape (n, 1)
.
Compute the sum of their p matrix products using a single call to np.tensordot
. The result
has shape (n, 1)
.
Start with the following code as template:
p, n = 10, 20
M = np.ones((p,n,n))
V = np.ones((p,n,1))
S = np.tensordot(M, V, ...)
print(S)
Example run:
X = np.array([[[1, 1],
[1, 1]],
[[2, 2],
[2, 2]]])
X.shape
is (2, 2, 2)
, i.e. we have two arrays of (2, 2)
.
Y = np.array([[[1],
[1]],
[[2],
[2]]])
Y.shape
is (2, 2, 1)
, i.e. we have two vectors of (2, 1)
.
out = np.tensordot(X, Y, ...)
The output is:
array([[10],
[10]])
Additional note: Since v1.8, numpy implements
dot
as a generalised ufunc, which means that it can be applied to stacks of matrices.
Attempt the Fortran-ordering quiz.
%reload_ext load_style
%load_style css/exercises.css