### Exercises for Memory-Efficient Computing

#### Optimizing arithmetic expressions

##### Exercise 1

Use script ``poly1.py`` to check how much time it takes to evaluate the next polynomial:

`    y = .25*x**3 + .75*x**2 - 1.5*x - 2`

with x in the range [-1, 1], and with 10 millions points.

• Set the `what` parameter to “numexpr” and take note of the speed-up versus the “numpy” case.
• Why do you think the speed-up is so large?
##### Exercise 2

The expression below:

`  y = ((.25*x + .75)*x - 1.5)*x - 2`

represents the same polynomial than the original one, but with some interesting side-effects in efficiency. Repeat the computation for numpy and numexpr and draw your own conclusions.

• Why do you think numpy is doing much more efficiently with this new expression?
• Why the speed-up in numexpr is not so high in comparison?
• Why numexpr continues to be faster than numpy?
##### Exercise 3

The C program ``poly.c`` does the same computation than above, but in pure C. Compile it like this:

`  gcc -O3 -o poly poly.c -lm`

and execute it.

• Why do you think it is more efficient than the above approaches?

##### Exercise 4

Be sure that you are on a multi-processor machine and repeat the last computation in poly1.py but increasing the number of threads one by one (change the number in the ``for nt in range(1):`` loop).

• How the efficiency scales?
• Why do you think it scales that way?
• How performance compares with the pure C computation?
##### Exercise 5

With the same multi-processor, recompile the above poly.c, but with OpenMP support:

`  gcc -O3 -o poly poly.c -lm -fopenmp    # notice the new -fopenmp flag!`

and execute it for several numbers of threads:

`  OMP_NUM_THREADS=desired_number_of_threads ./poly`

Compare its performance with the parallel numexpr.

• How the efficiency scales?
• Which is the asymptotic limit?
##### Exercise 6

With the previous examples, compute the expression:

`  y = x`

That is, do a simple copy of the `x` vector. What's the performance that you are seeing? How does it evolve when using different threads?

#### Evaluating with carray

##### Exercise 7

Look into the sources of carray-eval.py and run it. For the first expression evaluation, i.e.:

`    ((.25*x + .75)*x - 1.5)*x - 2`
• Why do you think carray evaluates faster than NumPy, even when using the Python VM (virtual machine).
• How much the compression slows down the evaluation? Which is the compression ratio achieved? Is that a lot?
##### Exercise 8

Repeat your reasoning with the second expression:

`    ((.25*x + .75)*x - 1.5)*x - 2 < 0`
• Why do you think the results vary so dramatically?

#### Querying Big Data

##### Exercise 9

Look into the sources of 'carray-ctable.py' script and run it.

• How a carray query compares with a numpy one?
• Which is the compression ratio achieved in the ctable `t`?
• How the different 'simple' and 'complex' query executes in comparison with the NumPy ones?
• If you are in the big Intel's Lab machine, increase the NROWS by one order of magnitude and re-run the benchmark. What do you see?
##### Exercise 10

Enter the ipython console and generate the big `t` ctable (just copy and paste the appropriate statements from the previous 'carray-ctable.py').

• Try to find the sweet spot for the 'simple' query by selecting different number of threads by running:
`      ca.set_nthreads(your_number_of_threads)`
• Repeat for the 'complex' query.
• Why do you think there is such a large different in the sweet spot? 