When parallelization does not help: the starving CPUs problem

Quick summary

Computational scientists should know that most of the time their CPUs are waiting for data to arrive. Knowing where the low-level bottlenecks are, and knowing what can be done to ameliorate them, may save hours of frustration when trying to understand why apparently well-written programs perform poorly.

I'll talk about why current CPUs are starving for data, and how to address this issue in modern computers by using different techniques.

Contents

1. Motivation

2. The Data Access Issue

3. High Performance Libraries

Materials for the lecture

Slides here. The multiprocessing script for NumPy is here.

Exercises

Fetch the tarball with the guidelines and sources from here.

Solutions

Fetch the solutions from here.

Displaying cache size

With a two-core intel Core 2 Duo processor:

$ tail /sys/devices/system/cpu/cpu0/cache/*/size
==> /sys/devices/system/cpu/cpu0/cache/index0/size <==
32K  # -> Level 1 cache size
==> /sys/devices/system/cpu/cpu0/cache/index1/size <==
32K  # -> Level 1 cache size
==> /sys/devices/system/cpu/cpu0/cache/index2/size <==
3072K  -> Level 2 cache size

With a four-core intel E5520:

$ tail /sys/devices/system/cpu/cpu0/cache/*/size
==> /sys/devices/system/cpu/cpu0/cache/index0/size <==
32K  # -> Level 1 cache size
==> /sys/devices/system/cpu/cpu0/cache/index1/size <==
32K  # -> Level 1 cache size
==> /sys/devices/system/cpu/cpu0/cache/index2/size <==
256K  # -> Level 2 cache size
==> /sys/devices/system/cpu/cpu0/cache/index3/size <==
8192K  # -> Level 3 cache size

Setting up SSH access: server for exercises

The exercises need to be performed on a machine with more power than a normal notebook.

Please save the details of the ssh connection to a config file (just once):

echo -e 'Host gnu\n\tHostName gnu.fuw.edu.pl\n\tPort 2005\n\tVisualHostkey yes' >>  ~/.ssh/config

Please login:

ssh <login>@gnu

The machine thinks of itself as 'debian'.

The login is the same as the one used for git.

To know more