To kit up, launch IPython:
ipython
and import Numpy:
In [1]: import numpy as np
Tune how it prints arrays (easier for the eyes):
In [2]: np.set_printoptions(precision=3)
Create a 5x6 Numpy array containing random numbers in range [0, 1].
Compute the mean of all the numbers in it
(To find the function to do this: np.lookfor("mean of array"))
Compute the minimum value in each row, and maximum in each column
Multiply each element by 10 and convert to an integer with the .astype() method.
What is the difference between a.astype(int) and np.around(a)?
Compare:
np.array([1, 2, 3, 4]) / 2
np.array([1.0, 2, 3, 4]) / 2
np.array([1, 2, 3, 4]) // 2
np.array([1.0, 2, 3, 4]) // 2
Why does it work like it does? How about with:
a = np.array([1, 2, 3, 4], dtype=float)
b = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.int8)
Which of the following operations create a view, and which a copy:
a = np.array([[1, 2, 3], [4, 5, 6]])
a[:,[0,1]]
a[:,0:2]
a[0]
a.T
a[[True, False]]
a.reshape(2*3) # bonus sector
a.T.reshape(2*3) # bonus sector
(Think first, then check.)
Use the function:
def change_it(x):
x[:] = np.array([7, 8, 9])
to change array:
a = np.array([1, 2, 3, 4, 5, 6])
to:
array([1, 7, 8, 9, 5, 6])
Consider:
a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int16)
What do the following operations do, and what are the resulting strides:
a
a.T
a[::-1]
Study the .strides, .flags, and str(a.data) attributes of the arrays:
a = np.array([[1, 2], [3, 4]], dtype=np.byte)
b = a.T
Which of the above are C-contiguous (and what does that mean)?
Divide each column of the array
>>> a = np.arange(25).reshape(5, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
elementwise with the array b = np.array([1.5, 5, 10, 15, 20]).
I.e., the result should be:
array([[ 0/1.5, 1/1.5, ...],
[ 5/5, 6/5, ...],
[10/10, 11/10, ...],
[15/15, 16/15, ...],
[20/20, 21/20, ...]])
Tips
Write a function f(a, b, c) that returns .
Generate a shape (24, 12, 6) array containing the values f(a_i,
b_j, c_k) at points a_i, b_j and c_k forming a grid in the
unit cube [0, 1] x [0, 1] x [0, 1].
Approximate the 3-d integral
over this volume with the mean of the values. The exact result is:
— how close do you get?
Try also using np.mgrid instead of broadcasting. Is there a speed difference? How about ogrid with broadcast_arrays?
Tips
Extract the 1st superdiagonal 1, 7, 14 from the array:
0 1 2 3
5 6 7 8
11 12 13 14
15 16 17 18
19 20 21 22
Then extract the 1st and the 3rd columns.
Generate a 10 x 3 array of random numbers (in range [0,1]). From each row, pick the number closest to 0.75.
Tips
Design a structured data type suitable for the data (in words.txt):
% rank lemma (10 letters max) frequency dispersion
21 they 1865844 0.96
42 her 969591 0.91
49 as 829018 0.95
7 to 6332195 0.98
63 take 670745 0.97
14 you 3085642 0.92
35 go 1151045 0.93
56 think 772787 0.91
28 not 1638883 0.98
Load the data from the text file. Examine the data you got, for example: extract words only, extract the 3rd row, print all words with rank < 30.
Sort the data according to frequency. Save the result to a Numpy data file sorted.npz with np.savez and load back with np.load. Do you get back what you put in?
Save the result to a text file sorted.txt using np.savetxt. Here, you need to provide a fmt argument to savetxt.
Tips
See the documentation of the .sort() method: help(np.ndarray.sort)
For structured arrays, savetxt needs a fmt argument that tells it what to do.
fmt is a string. For example "%s %d %g" tells that the first field is to be formatted as a string, the second as an integer, and the third as a float.
The .wav audio files are binary files: they contain a fixed-size header followed by raw sound data.
Construct a Numpy structured data type describing the .wav file header, and use it to read the header. Print for example the sample rate and number of channels. (A test.wav is provided so you can try things out on that.)
Tips
You can read a binary structure described by some_dtype to a Numpy array with:
with open('test.wav', 'rb') as f:
data = np.fromfile(f, dtype=some_dtype, count=1)
.wav file structure
Byte # | Field | |
---|---|---|
0 | chunk_id | 4-byte string ("RIFF") |
4 | chunk_size | 4-byte uint (little-endian) |
8 | format | 4-byte string ("WAVE") |
12 | fmt_id | 4-byte string ("fmt ") |
16 | fmt_size | 4-byte uint (little-endian) |
20 | audio_fmt | 2-byte uint (little-endian) |
22 | num_channels | 2-byte uint (little-endian) |
24 | sample_rate | 4-byte uint (little-endian) |
28 | byte_rate | 4-byte uint (little-endian) |
32 | block_align | 2-byte uint (little-endian) |
34 | bits_per_sample | 2-byte uint (little-endian) |
36 | data_id | 4-byte string ("data") |
40 | data_size | 4-byte uint (little-endian) |
Reimplement array indexing (for 2-D, without using Numpy)! Write a function data_at_index(indices, data, strides, dtype) that returns the data corresponding to a specified array element, as a string of bytes. I.e.:
a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int16)
b = a.T
data_at_index([0, 1], str(a.data), a.strides, a.dtype) == '\x02\x00' == str(a[0,1].data)
data_at_index([0, 1], str(b.data), b.strides, b.dtype) == '\x04\x00' == str(b[0,1].data)
Check first that you understand the meaning of:
Build a sliding 3-item window for the array:
x = np.arange(10, dtype=np.int32)
The aim is to get an array:
array([[0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 9]], dtype=int32)without making copies (so that it is fast). The trick is a stride trick:
from numpy.lib.stride_tricks import as_strided strides = ... y = as_strided(x, shape=(8, 3), strides=strides)
Use the same trick to compute the 5 x 5 median filter of an image. For each pixel, compute the median of the 5 x 5 block of pixels surrounding it.
The median filter provides a degree of denoising similarly to a gaussian blur, but it preserves sharp edges better.
>>> import scipy
>>> import matplotlib.pyplot as plt
Noisy image
>>> img = scipy.lena() # A standard test image for image processing
>>> img += 0.8 * img.std() * np.random.rand(*img.shape)
>>> plt.imshow(img)
Prepare the sliding window
>>> assert img.flags.c_contiguous # Important!
>>> window_size = 5
>>> shape = ... # Careful, no out-of-bounds access...
>>> strides = ...
>>> img_window = as_strided(...)
Denoise!
>>> img_median = np.median(img_window.reshape(..., window_size*window_size), axis=...)
>>> plt.imshow(img_median)
>>> plt.gray()
>>> plt.imsave('sharpened.png', img_median)
Note
Extra: Visit the factory
We don’t yet have a rolling_window function in Numpy that would make the above easier. We, however, do have a contributed implementation that is discussed here:
Can you extend the version posted by Warren to make N-dimensional windows, or think of any other features such a function would need to have? (If yes, just ask me how to contribute your stuff.)
Generate an approximation to the Menger sponge by creating a 3-D Numpy array filled with 1, and drilling holes to it with slicing.
Tips:
Take a 2-D slice of the sponge diagonally through the center of the cube, with normal vector (1, 1, 1). What sort of a patterns you get in the intersection?
Tips (for one approach):