# Optimizing your code with NumPy, Cython, pythran and numba

I had the pleasure of attending a workshop given by the groupe calcul (CNRS) this week. The topic was: how do you optimize the execution speed of your Python code, under the hypothesis that you already tried to make it fast using NumPy?

The training was held over three days and presented three interesting ways to achieve speedups: Cython, pythran and numba.

Overall, the workshop was great. The goal of this blog post is to summarize some of the key insights that I learnt while using these three tools on an practical application: image filtering.

This post uses the following versions of the libraries:

In :
import cython; print(cython.__version__)
import pythran; print(pythran.__version__)
import numba; print(numba.__version__)

WARNING: Disabling color, you really want to install colorlog.

0.27.3

WARNING: Pythran support disabled for module: omp

0.8.4post0
0.36.2


# Introducing the discrete Laplacian¶

When working with images, the discrete Laplacian operator is often used for edge detection.

Intuitively, if you want to find the edges of an image, you compute the Laplacian and threshold it to see the edges appear. Let's give an example using scikit-image.

First, let's load a standard grayscale image that ships with scikit-image, the astronaut.

In :
import skimage.data
import skimage.color

In :
image = skimage.data.astronaut()
image = skimage.color.rgb2gray(image)
image.shape

Out:
(512, 512)

Now, let's write a function that uses the scikit-image implementation of the discrete Laplace operator and thresholds the result with a fixed threshold.

In :
from skimage.filters import laplace
import numpy as np

In :
def laplace_skimage(image):
"""Applies Laplace operator to 2D image using skimage implementation.
Then tresholds the result and returns boolean image."""
laplacian = laplace(image)
thresh = np.abs(laplacian) > 0.05
return thresh


Let's apply the function and check its result.

In :
edges = laplace_skimage(image)

In :
edges.shape

Out:
(512, 512)

Let's display the obtained edges, along with the original image.

In :
import matplotlib.pyplot as plt
%matplotlib inline

In :
def compare(left, right):
"""Compares two images, left and right."""
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax.imshow(left, cmap='gray')
ax.imshow(right, cmap='gray')

In :
compare(left=image, right=edges) Since the topic of this post is to deal with optimization, we will now suppose that we're not using scikit-image.

As is usually the case when you're working on your own problems, we'll implement the Laplacian algorithm by ourselves using NumPy.

# NumPy implementation¶

So here we go: we write a function that makes use of vectorized NumPy to perform the same operation than the scikit-image implementation.

In :
def laplace_numpy(image):
"""Applies Laplace operator to 2D image using our own NumPy implementation.
Then tresholds the result and returns boolean image."""
laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
thresh = np.abs(laplacian) > 0.05
return thresh


If you're paying attention closely, we're not actually implementing exactly the same function as scikit-image, since we're excluding all border pixels of the image from our calculation.

Let's check the result is qualitatively correct.

In :
laplace_numpy(image).shape

Out:
(510, 510)
In :
compare(image, laplace_numpy(image)) And let's also check that our implementation quantitavely agrees with scikit-image on every interior pixel.

In :
np.allclose(laplace_skimage(image)[1:-1, 1:-1], laplace_numpy(image))

Out:
True

Good, we're now all set for the next step. Let's pretend that we think our implementation is slow. How do we objectively measure that? And then: how do we make it faster?

# Profiling and timing¶

So, we think our NumPy version is slow. How can we put a number on that feeling? The first idea is to time the execution on our test image. We can easily do this with the %time line magic.

In :
%time laplace_numpy(image)

CPU times: user 3.43 ms, sys: 1.32 ms, total: 4.74 ms
Wall time: 5.08 ms

Out:
array([[False,  True,  True, ..., False, False, False],
[False,  True, False, ..., False, False, False],
[ True,  True, False, ..., False, False, False],
...,
[False, False, False, ...,  True, False, False],
[False, False, False, ...,  True, False, False],
[False, False, False, ...,  True,  True, False]], dtype=bool)

Mmmh, around 5 milliseconds per call. If we run this multiple times, the results slightly vary.

Therefore, it is better to use the %timeit line magic, which averages the timing operation and gives us a more accurate measurement.

In :
%timeit laplace_numpy(image)

3.37 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


How does the scikit-image implementation compare to that?

In :
%timeit laplace_skimage(image)

3.88 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Approximately the same, albeit a little slower.

So, our function is too slow for say, our realtime application. To make it go faster, we first need to have an idea about what takes time in it. Our function works in two steps: first we compute the laplacian and then we threshold the results. Which part of the function is taking the most time?

To answer that question, we use a profiler. Many different profilers exit. We'll look at three below: %prun, %lprun and pprofile.

Using %prun gives us a high level view of what's happening inside our call.

In :
r = %prun -r laplace_numpy(image)
r.print_stats()

          4 function calls in 0.004 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.004    0.004    0.004    0.004 :1(laplace_numpy)
1    0.000    0.000    0.004    0.004 :1()
1    0.000    0.000    0.004    0.004 {built-in method builtins.exec}
1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Out:

This is not very informative. But we understand that most of the time (column tottime) is spent inside our laplace_numpy function.

In :
%load_ext line_profiler

In :
r = %lprun -r -f laplace_numpy laplace_numpy(image)
r.print_stats()

Timer unit: 1e-06 s

Total time: 0.007151 s
File:
Function: laplace_numpy at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
1                                           def laplace_numpy(image):
2                                               """Applies Laplace operator to 2D image using our own NumPy implementation.
3                                               Then tresholds the result and returns boolean image."""
4         1       6403.0   6403.0     89.5      laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
5         1        745.0    745.0     10.4      thresh = np.abs(laplacian) > 0.05
6         1          3.0      3.0      0.0      return thresh



Running %lprun -r -f laplace_numpy laplace_numpy(image) means that we aske %lprun to show how long each line took to execute inside the function laplace_numpy and return the result as an object.

Here, we can answer the earlier question: computing the laplacian is what takes the most time in this function.

Let's see if pprofile confirms that measurement.

In :
import pprofile
def func_to_profile():
prof = pprofile.Profile()
with prof():
laplace_numpy(image)
prof.print_stats()

In :
func_to_profile()

Total duration: 0.00508428s
File:
File duration: 0.00480628s (94.53%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
1|         1|  1.90735e-05|  1.90735e-05|  0.38%|def laplace_numpy(image):
2|         0|            0|            0|  0.00%|    """Applies Laplace operator to 2D image using our own NumPy implementation.
3|         0|            0|            0|  0.00%|    Then tresholds the result and returns boolean image."""
4|         1|   0.00392222|   0.00392222| 77.14%|    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
5|         1|  0.000836849|  0.000836849| 16.46%|    thresh = np.abs(laplacian) > 0.05
(call)|         1|   0.00480628|   0.00480628| 94.53%|# :1 laplace_numpy
6|         1|  2.81334e-05|  2.81334e-05|  0.55%|    return thresh

/Users/kappamaki/anaconda/lib/python3.6/site-packages/pprofile.py:102: UserWarning: Cannot access ".buffer", invalid entities from source files will cause errors when annotating.
'files will cause errors when annotating.' % (stream, )


pprofile confirms the previous assessment: it's the laplacian computation that takes time.

What's more, pprofile is a little bit simpler to use than line_profiler. However, it doesn't feature a cell magic (although there's already someone asking for this here).

At this point, we know that to make our function faster, the first thing to optimize is the computation of the laplacian. Let's try that using Cython.

# Cython¶

What Cython does, like the other tools we'll see below, is to compile our Python code into C code that will hopefully be very fast. Since Cython has a so-called IPython magic function, we'll be able to continue working in the notebook. If we use the -a option with the Cython magic, we can obtain an annotated version of our source code that shows where Cython uses C code and where it uses Python objects.

In :
%load_ext cython

In :
%%cython -a
import numpy as np

def laplace_cython(image):
"""Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
Cython implementation."""
laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
thresh = np.abs(laplacian) > 0.05
return thresh

Out:
Cython: _cython_magic_862e7a2cc089394207c84948a1866a79.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+1: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 2:
+3: def laplace_cython(image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
PyObject *__pyx_r = 0;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
__pyx_r = __pyx_pf_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython(__pyx_self, ((PyObject *)__pyx_v_image));

/* function exit code */
__Pyx_RefNannyFinishContext();
return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_image) {
PyObject *__pyx_v_laplacian = NULL;
PyObject *__pyx_v_thresh = NULL;
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
/* function exit code */
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_XDECREF(__pyx_t_2);
__Pyx_XDECREF(__pyx_t_3);
__Pyx_XDECREF(__pyx_t_4);
__pyx_r = NULL;
__pyx_L0:;
__Pyx_XDECREF(__pyx_v_laplacian);
__Pyx_XDECREF(__pyx_v_thresh);
__Pyx_XGIVEREF(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
/* … */
__pyx_tuple__16 = PyTuple_Pack(3, __pyx_n_s_image, __pyx_n_s_laplacian, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__16)) __PYX_ERR(0, 3, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__16);
__Pyx_GIVEREF(__pyx_tuple__16);
/* … */
__pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython, NULL, __pyx_n_s_cython_magic_862e7a2cc089394207); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 4:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 5:     Cython implementation."""
+6:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
  __pyx_slice_ = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice_)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice_);
__Pyx_GIVEREF(__pyx_slice_);
__pyx_slice__2 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__2)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__2);
__Pyx_GIVEREF(__pyx_slice__2);
/* … */
__pyx_t_1 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_tuple__3 = PyTuple_Pack(2, __pyx_slice_, __pyx_slice__2); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__3);
__Pyx_GIVEREF(__pyx_tuple__3);
__pyx_slice__4 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__4)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__4);
__Pyx_GIVEREF(__pyx_slice__4);
__pyx_slice__5 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__5)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__5);
__Pyx_GIVEREF(__pyx_slice__5);
__pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__6); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_tuple__6 = PyTuple_Pack(2, __pyx_slice__4, __pyx_slice__5); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__6);
__Pyx_GIVEREF(__pyx_tuple__6);
__pyx_slice__7 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__7)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__7);
__Pyx_GIVEREF(__pyx_slice__7);
__pyx_slice__8 = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice__8)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__8);
__Pyx_GIVEREF(__pyx_slice__8);
__pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__9); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_1 = PyNumber_Add(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_tuple__9 = PyTuple_Pack(2, __pyx_slice__7, __pyx_slice__8); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__9);
__Pyx_GIVEREF(__pyx_tuple__9);
__pyx_slice__10 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__10)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__10);
__Pyx_GIVEREF(__pyx_slice__10);
__pyx_slice__11 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__11)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__11);
__Pyx_GIVEREF(__pyx_slice__11);
__pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__12); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_tuple__12 = PyTuple_Pack(2, __pyx_slice__10, __pyx_slice__11); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__12);
__Pyx_GIVEREF(__pyx_tuple__12);
__pyx_slice__13 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__13)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__13);
__Pyx_GIVEREF(__pyx_slice__13);
__pyx_slice__14 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__14)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__14);
__Pyx_GIVEREF(__pyx_slice__14);
__pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__15); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_1 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_t_2 = PyNumber_Subtract(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_v_laplacian = __pyx_t_2;
__pyx_t_2 = 0;
__pyx_tuple__15 = PyTuple_Pack(2, __pyx_slice__13, __pyx_slice__14); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(0, 6, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__15);
__Pyx_GIVEREF(__pyx_tuple__15);

+7:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = NULL;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
__pyx_t_1 = PyMethod_GET_SELF(__pyx_t_3);
if (likely(__pyx_t_1)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
__Pyx_INCREF(__pyx_t_1);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_3, function);
}
}
if (!__pyx_t_1) {
__pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
} else {
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_1, __pyx_v_laplacian};
__pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_GOTREF(__pyx_t_2);
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_1, __pyx_v_laplacian};
__pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_GOTREF(__pyx_t_2);
} else
#endif
{
__pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;
__Pyx_INCREF(__pyx_v_laplacian);
__Pyx_GIVEREF(__pyx_v_laplacian);
PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_laplacian);
__pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
}
}
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_v_thresh = __pyx_t_3;
__pyx_t_3 = 0;

+8:     return thresh
  __Pyx_XDECREF(__pyx_r);
__Pyx_INCREF(__pyx_v_thresh);
__pyx_r = __pyx_v_thresh;
goto __pyx_L0;


What the above code shows is that Cython couldn't optimize our function much, as indicated by the yellow hue of the lines above. The yellow intensity in each line shows the amount of interaction with Python objects that it found. The more you have to use Python objects, the less speedup you are likely to achieve.

Let's time this first compiled version.

In :
%timeit laplace_cython(image)

3.84 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


It performs roughly at the same speed that our NumPy version. So there's no much use in switching to Cython. Can we do better? The key is to eliminate interaction with Python through different ways. A first one is to declare our input image is an array.

In :
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray image):
"""Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
Cython implementation."""
laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
thresh = np.abs(laplacian) > 0.05
return thresh

In file included from /Users/kappamaki/.ipython/cython/_cython_magic_53222c7a36e00268207d0fde8ad0c365.c:546:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
^
1 warning generated.

Out:

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+1: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
__pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 2: cimport numpy as cnp
 3:
+4: def laplace_cython(cnp.ndarray image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyObject *__pyx_pw_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
PyObject *__pyx_r = 0;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)

/* function exit code */
goto __pyx_L0;
__pyx_L1_error:;
__pyx_r = NULL;
__pyx_L0:;
__Pyx_RefNannyFinishContext();
return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
PyObject *__pyx_v_laplacian = NULL;
PyObject *__pyx_v_thresh = NULL;
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
/* function exit code */
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_XDECREF(__pyx_t_2);
__Pyx_XDECREF(__pyx_t_3);
__Pyx_XDECREF(__pyx_t_4);
__pyx_r = NULL;
__pyx_L0:;
__Pyx_XDECREF(__pyx_v_laplacian);
__Pyx_XDECREF(__pyx_v_thresh);
__Pyx_XGIVEREF(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
/* … */
__pyx_tuple__25 = PyTuple_Pack(3, __pyx_n_s_image, __pyx_n_s_laplacian, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__25)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__25);
__Pyx_GIVEREF(__pyx_tuple__25);
/* … */
__pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython, NULL, __pyx_n_s_cython_magic_53222c7a36e0026820); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 5:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 6:     Cython implementation."""
+7:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
  __pyx_slice_ = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice_)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice_);
__Pyx_GIVEREF(__pyx_slice_);
__pyx_slice__2 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__2);
__Pyx_GIVEREF(__pyx_slice__2);
/* … */
__pyx_t_1 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_tuple__3 = PyTuple_Pack(2, __pyx_slice_, __pyx_slice__2); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__3);
__Pyx_GIVEREF(__pyx_tuple__3);
__pyx_slice__4 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__4)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__4);
__Pyx_GIVEREF(__pyx_slice__4);
__pyx_slice__5 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__5)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__5);
__Pyx_GIVEREF(__pyx_slice__5);
__pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__6); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_tuple__6 = PyTuple_Pack(2, __pyx_slice__4, __pyx_slice__5); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__6);
__Pyx_GIVEREF(__pyx_tuple__6);
__pyx_slice__7 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__7)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__7);
__Pyx_GIVEREF(__pyx_slice__7);
__pyx_slice__8 = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice__8)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__8);
__Pyx_GIVEREF(__pyx_slice__8);
__pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__9); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_1 = PyNumber_Add(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_tuple__9 = PyTuple_Pack(2, __pyx_slice__7, __pyx_slice__8); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__9);
__Pyx_GIVEREF(__pyx_tuple__9);
__pyx_slice__10 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__10)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__10);
__Pyx_GIVEREF(__pyx_slice__10);
__pyx_slice__11 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__11)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__11);
__Pyx_GIVEREF(__pyx_slice__11);
__pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__12); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_tuple__12 = PyTuple_Pack(2, __pyx_slice__10, __pyx_slice__11); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__12);
__Pyx_GIVEREF(__pyx_tuple__12);
__pyx_slice__13 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__13)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__13);
__Pyx_GIVEREF(__pyx_slice__13);
__pyx_slice__14 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__14)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_slice__14);
__Pyx_GIVEREF(__pyx_slice__14);
__pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__15); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_1 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_t_2 = PyNumber_Subtract(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_v_laplacian = __pyx_t_2;
__pyx_t_2 = 0;
__pyx_tuple__15 = PyTuple_Pack(2, __pyx_slice__13, __pyx_slice__14); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__15);
__Pyx_GIVEREF(__pyx_tuple__15);

+8:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = NULL;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
__pyx_t_1 = PyMethod_GET_SELF(__pyx_t_3);
if (likely(__pyx_t_1)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
__Pyx_INCREF(__pyx_t_1);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_3, function);
}
}
if (!__pyx_t_1) {
__pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
} else {
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_1, __pyx_v_laplacian};
__pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_GOTREF(__pyx_t_2);
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_1, __pyx_v_laplacian};
__pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_GOTREF(__pyx_t_2);
} else
#endif
{
__pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;
__Pyx_INCREF(__pyx_v_laplacian);
__Pyx_GIVEREF(__pyx_v_laplacian);
PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_laplacian);
__pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
}
}
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_v_thresh = __pyx_t_3;
__pyx_t_3 = 0;

+9:     return thresh
  __Pyx_XDECREF(__pyx_r);
__Pyx_INCREF(__pyx_v_thresh);
__pyx_r = __pyx_v_thresh;
goto __pyx_L0;

In :
%timeit laplace_cython(image)

3.98 ms ± 62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The timing didn't improve. It turns, if you click on the yellow lines, that Cython is not able to optimize the broadcasting operation (and its fancy bracketing) that we usually perform with NumPy. To allow Cython to optimize, let's write an explicit loop over the pixels in the image.

In :
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray image):
"""Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
Cython implementation."""
cdef int h = image.shape
cdef int w = image.shape
laplacian = np.empty((w-2, h-2))
cdef int i, j
for i in range(1, h-1):
for j in range(1, w-1):
laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
thresh = np.abs(laplacian) > 0.05
return thresh

In file included from /Users/kappamaki/.ipython/cython/_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0.c:547:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
^
1 warning generated.

Out:
Cython: _cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
__pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 02: cimport numpy as cnp
 03:
+04: def laplace_cython(cnp.ndarray image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
PyObject *__pyx_r = 0;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
__pyx_r = __pyx_pf_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

/* function exit code */
goto __pyx_L0;
__pyx_L1_error:;
__pyx_r = NULL;
__pyx_L0:;
__Pyx_RefNannyFinishContext();
return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
int __pyx_v_h;
int __pyx_v_w;
PyObject *__pyx_v_laplacian = NULL;
int __pyx_v_i;
int __pyx_v_j;
PyObject *__pyx_v_thresh = NULL;
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
/* function exit code */
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_XDECREF(__pyx_t_2);
__Pyx_XDECREF(__pyx_t_3);
__Pyx_XDECREF(__pyx_t_4);
__Pyx_XDECREF(__pyx_t_5);
__pyx_r = NULL;
__pyx_L0:;
__Pyx_XDECREF(__pyx_v_laplacian);
__Pyx_XDECREF(__pyx_v_thresh);
__Pyx_XGIVEREF(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
/* … */
__pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__10);
__Pyx_GIVEREF(__pyx_tuple__10);
/* … */
__pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython, NULL, __pyx_n_s_cython_magic_6d19cbef7cff92c3c6); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 05:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 06:     Cython implementation."""
+07:     cdef int h = image.shape
  __pyx_v_h = (__pyx_v_image->dimensions);

+08:     cdef int w = image.shape
  __pyx_v_w = (__pyx_v_image->dimensions);

+09:     laplacian = np.empty((w-2, h-2))
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_empty); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_t_2 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_4 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_GIVEREF(__pyx_t_2);
PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);
__Pyx_GIVEREF(__pyx_t_4);
PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_4);
__pyx_t_2 = 0;
__pyx_t_4 = 0;
__pyx_t_4 = NULL;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
__pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
if (likely(__pyx_t_4)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
__Pyx_INCREF(__pyx_t_4);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_3, function);
}
}
if (!__pyx_t_4) {
__pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__Pyx_GOTREF(__pyx_t_1);
} else {
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_4, __pyx_t_5};
__pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_4, __pyx_t_5};
__pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
} else
#endif
{
__pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
__Pyx_GIVEREF(__pyx_t_5);
PyTuple_SET_ITEM(__pyx_t_2, 0+1, __pyx_t_5);
__pyx_t_5 = 0;
__pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
}
}
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_v_laplacian = __pyx_t_1;
__pyx_t_1 = 0;

 10:     cdef int i, j
+11:     for i in range(1, h-1):
  __pyx_t_6 = (__pyx_v_h - 1);
for (__pyx_t_7 = 1; __pyx_t_7 < __pyx_t_6; __pyx_t_7+=1) {
__pyx_v_i = __pyx_t_7;

+12:         for j in range(1, w-1):
    __pyx_t_8 = (__pyx_v_w - 1);
for (__pyx_t_9 = 1; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {
__pyx_v_j = __pyx_t_9;

+13:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_i - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_GIVEREF(__pyx_t_1);
PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
__Pyx_GIVEREF(__pyx_t_3);
PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_3);
__pyx_t_1 = 0;
__pyx_t_3 = 0;
__pyx_t_3 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_t_2 = __Pyx_PyInt_From_long((__pyx_v_i + 1)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_GIVEREF(__pyx_t_2);
PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);
__Pyx_GIVEREF(__pyx_t_1);
PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
__pyx_t_2 = 0;
__pyx_t_1 = 0;
__pyx_t_1 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_t_5 = PyNumber_Add(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_j - 1)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_GIVEREF(__pyx_t_1);
PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
__Pyx_GIVEREF(__pyx_t_3);
PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_3);
__pyx_t_1 = 0;
__pyx_t_3 = 0;
__pyx_t_3 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_t_2 = PyNumber_Add(__pyx_t_5, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_5 = __Pyx_PyInt_From_long((__pyx_v_j + 1)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_GIVEREF(__pyx_t_3);
PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_3);
__Pyx_GIVEREF(__pyx_t_5);
PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_t_5);
__pyx_t_3 = 0;
__pyx_t_5 = 0;
__pyx_t_5 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = PyNumber_Add(__pyx_t_2, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_t_5 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_GIVEREF(__pyx_t_5);
PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5);
__Pyx_GIVEREF(__pyx_t_2);
PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_2);
__pyx_t_5 = 0;
__pyx_t_2 = 0;
__pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_t_2 = PyNumber_Subtract(__pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_i - 1)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_j - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_GIVEREF(__pyx_t_3);
PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_3);
__Pyx_GIVEREF(__pyx_t_1);
PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
__pyx_t_3 = 0;
__pyx_t_1 = 0;
if (unlikely(PyObject_SetItem(__pyx_v_laplacian, __pyx_t_5, __pyx_t_2) < 0)) __PYX_ERR(0, 13, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
}
}

+14:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_5 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_t_5, __pyx_n_s_abs); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_t_5 = NULL;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_1))) {
__pyx_t_5 = PyMethod_GET_SELF(__pyx_t_1);
if (likely(__pyx_t_5)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_1);
__Pyx_INCREF(__pyx_t_5);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_1, function);
}
}
if (!__pyx_t_5) {
__pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_1, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
} else {
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_1)) {
PyObject *__pyx_temp = {__pyx_t_5, __pyx_v_laplacian};
__pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_1, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
__Pyx_GOTREF(__pyx_t_2);
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_1)) {
PyObject *__pyx_temp = {__pyx_t_5, __pyx_v_laplacian};
__pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_1, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
__Pyx_GOTREF(__pyx_t_2);
} else
#endif
{
__pyx_t_3 = PyTuple_New(1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5); __pyx_t_5 = NULL;
__Pyx_INCREF(__pyx_v_laplacian);
__Pyx_GIVEREF(__pyx_v_laplacian);
PyTuple_SET_ITEM(__pyx_t_3, 0+1, __pyx_v_laplacian);
__pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_t_3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
}
}
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__pyx_v_thresh = __pyx_t_1;
__pyx_t_1 = 0;

+15:     return thresh
  __Pyx_XDECREF(__pyx_r);
__Pyx_INCREF(__pyx_v_thresh);
__pyx_r = __pyx_v_thresh;
goto __pyx_L0;

In :
%timeit laplace_cython(image)

335 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Woops! The function we just wrote is 100x times slower than our NumPy implementation! The reason can be traced to the core of our function: the sum of the five pixel bound terms should be free from any Python object conversions. However, that line is all yellow and if you click it, you'll see it's full of Python object conversions.

How do we solve this problem? The answer is: we need to pass the array as a buffer that can be accessed at C speeds. Let's see:

In :
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray[double, ndim=2] image):
"""Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
Cython implementation."""
cdef int h = image.shape
cdef int w = image.shape
cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
cdef int i, j
for i in range(1, h-1):
for j in range(1, w-1):
laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
thresh = np.abs(laplacian) > 0.05
return thresh

In file included from /Users/kappamaki/.ipython/cython/_cython_magic_bd5177082c943517a15103961a130b2a.c:548:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
^
1 warning generated.

Out:
Cython: _cython_magic_bd5177082c943517a15103961a130b2a.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
__pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 02: cimport numpy as cnp
 03:
+04: def laplace_cython(cnp.ndarray[double, ndim=2] image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
PyObject *__pyx_r = 0;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
__pyx_r = __pyx_pf_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

/* function exit code */
goto __pyx_L0;
__pyx_L1_error:;
__pyx_r = NULL;
__pyx_L0:;
__Pyx_RefNannyFinishContext();
return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
int __pyx_v_h;
int __pyx_v_w;
PyArrayObject *__pyx_v_laplacian = 0;
int __pyx_v_i;
int __pyx_v_j;
PyObject *__pyx_v_thresh = NULL;
__Pyx_LocalBuf_ND __pyx_pybuffernd_image;
__Pyx_Buffer __pyx_pybuffer_image;
__Pyx_LocalBuf_ND __pyx_pybuffernd_laplacian;
__Pyx_Buffer __pyx_pybuffer_laplacian;
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython", 0);
__pyx_pybuffer_laplacian.pybuffer.buf = NULL;
__pyx_pybuffer_laplacian.refcount = 0;
__pyx_pybuffernd_laplacian.data = NULL;
__pyx_pybuffernd_laplacian.rcbuffer = &__pyx_pybuffer_laplacian;
__pyx_pybuffer_image.pybuffer.buf = NULL;
__pyx_pybuffer_image.refcount = 0;
__pyx_pybuffernd_image.data = NULL;
__pyx_pybuffernd_image.rcbuffer = &__pyx_pybuffer_image;
{
__Pyx_BufFmt_StackElem __pyx_stack;
if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_image.rcbuffer->pybuffer, (PyObject*)__pyx_v_image, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) __PYX_ERR(0, 4, __pyx_L1_error)
}
__pyx_pybuffernd_image.diminfo.strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides; __pyx_pybuffernd_image.diminfo.shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape; __pyx_pybuffernd_image.diminfo.strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides; __pyx_pybuffernd_image.diminfo.shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape;
/* … */
/* function exit code */
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_XDECREF(__pyx_t_2);
__Pyx_XDECREF(__pyx_t_3);
__Pyx_XDECREF(__pyx_t_4);
__Pyx_XDECREF(__pyx_t_5);
{ PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
__Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
__Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
__pyx_r = NULL;
goto __pyx_L2;
__pyx_L0:;
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
__pyx_L2:;
__Pyx_XDECREF((PyObject *)__pyx_v_laplacian);
__Pyx_XDECREF(__pyx_v_thresh);
__Pyx_XGIVEREF(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
/* … */
__pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__10);
__Pyx_GIVEREF(__pyx_tuple__10);
/* … */
__pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython, NULL, __pyx_n_s_cython_magic_bd5177082c943517a1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 05:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 06:     Cython implementation."""
+07:     cdef int h = image.shape
  __pyx_v_h = (__pyx_v_image->dimensions);

+08:     cdef int w = image.shape
  __pyx_v_w = (__pyx_v_image->dimensions);

+09:     cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__Pyx_GIVEREF(__pyx_t_1);
PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
__Pyx_GIVEREF(__pyx_t_3);
PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3);
__pyx_t_1 = 0;
__pyx_t_3 = 0;
__pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_GIVEREF(__pyx_t_4);
PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_4);
__pyx_t_4 = 0;
__pyx_t_4 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_double); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 9, __pyx_L1_error)
__pyx_t_6 = ((PyArrayObject *)__pyx_t_5);
{
__Pyx_BufFmt_StackElem __pyx_stack;
if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {
__pyx_v_laplacian = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf = NULL;
__PYX_ERR(0, 9, __pyx_L1_error)
} else {__pyx_pybuffernd_laplacian.diminfo.strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides; __pyx_pybuffernd_laplacian.diminfo.shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape; __pyx_pybuffernd_laplacian.diminfo.strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides; __pyx_pybuffernd_laplacian.diminfo.shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape;
}
}
__pyx_t_6 = 0;
__pyx_v_laplacian = ((PyArrayObject *)__pyx_t_5);
__pyx_t_5 = 0;

 10:     cdef int i, j
+11:     for i in range(1, h-1):
  __pyx_t_7 = (__pyx_v_h - 1);
for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
__pyx_v_i = __pyx_t_8;

+12:         for j in range(1, w-1):
    __pyx_t_9 = (__pyx_v_w - 1);
for (__pyx_t_10 = 1; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
__pyx_v_j = __pyx_t_10;

+13:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_11 = (__pyx_v_i - 1);
__pyx_t_12 = __pyx_v_j;
__pyx_t_13 = -1;
if (__pyx_t_11 < 0) {
__pyx_t_11 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_11 < 0)) __pyx_t_13 = 0;
} else if (unlikely(__pyx_t_11 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 0;
if (__pyx_t_12 < 0) {
__pyx_t_12 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_12 < 0)) __pyx_t_13 = 1;
} else if (unlikely(__pyx_t_12 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 1;
if (unlikely(__pyx_t_13 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_13);
__PYX_ERR(0, 13, __pyx_L1_error)
}
__pyx_t_14 = (__pyx_v_i + 1);
__pyx_t_15 = __pyx_v_j;
__pyx_t_13 = -1;
if (__pyx_t_14 < 0) {
__pyx_t_14 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_14 < 0)) __pyx_t_13 = 0;
} else if (unlikely(__pyx_t_14 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 0;
if (__pyx_t_15 < 0) {
__pyx_t_15 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_15 < 0)) __pyx_t_13 = 1;
} else if (unlikely(__pyx_t_15 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 1;
if (unlikely(__pyx_t_13 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_13);
__PYX_ERR(0, 13, __pyx_L1_error)
}
__pyx_t_16 = __pyx_v_i;
__pyx_t_17 = (__pyx_v_j - 1);
__pyx_t_13 = -1;
if (__pyx_t_16 < 0) {
__pyx_t_16 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_16 < 0)) __pyx_t_13 = 0;
} else if (unlikely(__pyx_t_16 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 0;
if (__pyx_t_17 < 0) {
__pyx_t_17 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_17 < 0)) __pyx_t_13 = 1;
} else if (unlikely(__pyx_t_17 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 1;
if (unlikely(__pyx_t_13 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_13);
__PYX_ERR(0, 13, __pyx_L1_error)
}
__pyx_t_18 = __pyx_v_i;
__pyx_t_19 = (__pyx_v_j + 1);
__pyx_t_13 = -1;
if (__pyx_t_18 < 0) {
__pyx_t_18 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_18 < 0)) __pyx_t_13 = 0;
} else if (unlikely(__pyx_t_18 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 0;
if (__pyx_t_19 < 0) {
__pyx_t_19 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_19 < 0)) __pyx_t_13 = 1;
} else if (unlikely(__pyx_t_19 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 1;
if (unlikely(__pyx_t_13 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_13);
__PYX_ERR(0, 13, __pyx_L1_error)
}
__pyx_t_20 = __pyx_v_i;
__pyx_t_21 = __pyx_v_j;
__pyx_t_13 = -1;
if (__pyx_t_20 < 0) {
__pyx_t_20 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_20 < 0)) __pyx_t_13 = 0;
} else if (unlikely(__pyx_t_20 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 0;
if (__pyx_t_21 < 0) {
__pyx_t_21 += __pyx_pybuffernd_image.diminfo.shape;
if (unlikely(__pyx_t_21 < 0)) __pyx_t_13 = 1;
} else if (unlikely(__pyx_t_21 >= __pyx_pybuffernd_image.diminfo.shape)) __pyx_t_13 = 1;
if (unlikely(__pyx_t_13 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_13);
__PYX_ERR(0, 13, __pyx_L1_error)
}
__pyx_t_22 = (__pyx_v_i - 1);
__pyx_t_23 = (__pyx_v_j - 1);
__pyx_t_13 = -1;
if (__pyx_t_22 < 0) {
__pyx_t_22 += __pyx_pybuffernd_laplacian.diminfo.shape;
if (unlikely(__pyx_t_22 < 0)) __pyx_t_13 = 0;
} else if (unlikely(__pyx_t_22 >= __pyx_pybuffernd_laplacian.diminfo.shape)) __pyx_t_13 = 0;
if (__pyx_t_23 < 0) {
__pyx_t_23 += __pyx_pybuffernd_laplacian.diminfo.shape;
if (unlikely(__pyx_t_23 < 0)) __pyx_t_13 = 1;
} else if (unlikely(__pyx_t_23 >= __pyx_pybuffernd_laplacian.diminfo.shape)) __pyx_t_13 = 1;
if (unlikely(__pyx_t_13 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_13);
__PYX_ERR(0, 13, __pyx_L1_error)
}
*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf, __pyx_t_22, __pyx_pybuffernd_laplacian.diminfo.strides, __pyx_t_23, __pyx_pybuffernd_laplacian.diminfo.strides) = (((((*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_12, __pyx_pybuffernd_image.diminfo.strides)) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_15, __pyx_pybuffernd_image.diminfo.strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_16, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_17, __pyx_pybuffernd_image.diminfo.strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_18, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_19, __pyx_pybuffernd_image.diminfo.strides))) - (4.0 * (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_20, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_21, __pyx_pybuffernd_image.diminfo.strides))));
}
}

+14:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
__pyx_t_4 = NULL;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
__pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
if (likely(__pyx_t_4)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
__Pyx_INCREF(__pyx_t_4);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_3, function);
}
}
if (!__pyx_t_4) {
__pyx_t_5 = __Pyx_PyObject_CallOneArg(__pyx_t_3, ((PyObject *)__pyx_v_laplacian)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
} else {
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
__pyx_t_5 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_GOTREF(__pyx_t_5);
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
__pyx_t_5 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_GOTREF(__pyx_t_5);
} else
#endif
{
__pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
__Pyx_INCREF(((PyObject *)__pyx_v_laplacian));
__Pyx_GIVEREF(((PyObject *)__pyx_v_laplacian));
PyTuple_SET_ITEM(__pyx_t_2, 0+1, ((PyObject *)__pyx_v_laplacian));
__pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
}
}
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_v_thresh = __pyx_t_3;
__pyx_t_3 = 0;

+15:     return thresh
  __Pyx_XDECREF(__pyx_r);
__Pyx_INCREF(__pyx_v_thresh);
__pyx_r = __pyx_v_thresh;
goto __pyx_L0;

In :
%timeit laplace_cython(image)

2.37 ms ± 68.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Finally! We have made our function faster by a factor of 2.

But can we go further? Yes, by observing that on the summation line, Cython still performs some bounds checks.

In :
%%cython -a
import numpy as np
cimport numpy as cnp
import cython

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def laplace_cython(cnp.ndarray[double, ndim=2] image):
"""Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
Cython implementation."""
cdef int h = image.shape
cdef int w = image.shape
cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
cdef int i, j
for i in range(1, h-1):
for j in range(1, w-1):
laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
thresh = np.abs(laplacian) > 0.05
return thresh

In file included from /Users/kappamaki/.ipython/cython/_cython_magic_416c56f791dd3ffab3f99fad86fc10b5.c:549:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
^
1 warning generated.

Out:

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
__pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 02: cimport numpy as cnp
 03: import cython
 04:
 05: @cython.boundscheck(False) # turn off bounds-checking for entire function
 06: @cython.wraparound(False)  # turn off negative index wrapping for entire function
+07: def laplace_cython(cnp.ndarray[double, ndim=2] image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyObject *__pyx_pw_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
PyObject *__pyx_r = 0;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 7, __pyx_L1_error)

/* function exit code */
goto __pyx_L0;
__pyx_L1_error:;
__pyx_r = NULL;
__pyx_L0:;
__Pyx_RefNannyFinishContext();
return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
int __pyx_v_h;
int __pyx_v_w;
PyArrayObject *__pyx_v_laplacian = 0;
int __pyx_v_i;
int __pyx_v_j;
PyObject *__pyx_v_thresh = NULL;
__Pyx_LocalBuf_ND __pyx_pybuffernd_image;
__Pyx_Buffer __pyx_pybuffer_image;
__Pyx_LocalBuf_ND __pyx_pybuffernd_laplacian;
__Pyx_Buffer __pyx_pybuffer_laplacian;
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("laplace_cython", 0);
__pyx_pybuffer_laplacian.pybuffer.buf = NULL;
__pyx_pybuffer_laplacian.refcount = 0;
__pyx_pybuffernd_laplacian.data = NULL;
__pyx_pybuffernd_laplacian.rcbuffer = &__pyx_pybuffer_laplacian;
__pyx_pybuffer_image.pybuffer.buf = NULL;
__pyx_pybuffer_image.refcount = 0;
__pyx_pybuffernd_image.data = NULL;
__pyx_pybuffernd_image.rcbuffer = &__pyx_pybuffer_image;
{
__Pyx_BufFmt_StackElem __pyx_stack;
if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_image.rcbuffer->pybuffer, (PyObject*)__pyx_v_image, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) __PYX_ERR(0, 7, __pyx_L1_error)
}
__pyx_pybuffernd_image.diminfo.strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides; __pyx_pybuffernd_image.diminfo.shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape; __pyx_pybuffernd_image.diminfo.strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides; __pyx_pybuffernd_image.diminfo.shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape;
/* … */
/* function exit code */
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_XDECREF(__pyx_t_2);
__Pyx_XDECREF(__pyx_t_3);
__Pyx_XDECREF(__pyx_t_4);
__Pyx_XDECREF(__pyx_t_5);
{ PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
__Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
__Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
__pyx_r = NULL;
goto __pyx_L2;
__pyx_L0:;
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
__pyx_L2:;
__Pyx_XDECREF((PyObject *)__pyx_v_laplacian);
__Pyx_XDECREF(__pyx_v_thresh);
__Pyx_XGIVEREF(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
/* … */
__pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_tuple__10);
__Pyx_GIVEREF(__pyx_tuple__10);
/* … */
__pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython, NULL, __pyx_n_s_cython_magic_416c56f791dd3ffab3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 7, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

 08:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 09:     Cython implementation."""
+10:     cdef int h = image.shape
  __pyx_v_h = (__pyx_v_image->dimensions);

+11:     cdef int w = image.shape
  __pyx_v_w = (__pyx_v_image->dimensions);

+12:     cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
__pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__Pyx_GIVEREF(__pyx_t_1);
PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
__Pyx_GIVEREF(__pyx_t_3);
PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3);
__pyx_t_1 = 0;
__pyx_t_3 = 0;
__pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_GIVEREF(__pyx_t_4);
PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_4);
__pyx_t_4 = 0;
__pyx_t_4 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_double); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 12, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 12, __pyx_L1_error)
__pyx_t_6 = ((PyArrayObject *)__pyx_t_5);
{
__Pyx_BufFmt_StackElem __pyx_stack;
if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {
__pyx_v_laplacian = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf = NULL;
__PYX_ERR(0, 12, __pyx_L1_error)
} else {__pyx_pybuffernd_laplacian.diminfo.strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides; __pyx_pybuffernd_laplacian.diminfo.shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape; __pyx_pybuffernd_laplacian.diminfo.strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides; __pyx_pybuffernd_laplacian.diminfo.shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape;
}
}
__pyx_t_6 = 0;
__pyx_v_laplacian = ((PyArrayObject *)__pyx_t_5);
__pyx_t_5 = 0;

 13:     cdef int i, j
+14:     for i in range(1, h-1):
  __pyx_t_7 = (__pyx_v_h - 1);
for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
__pyx_v_i = __pyx_t_8;

+15:         for j in range(1, w-1):
    __pyx_t_9 = (__pyx_v_w - 1);
for (__pyx_t_10 = 1; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
__pyx_v_j = __pyx_t_10;

+16:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_11 = (__pyx_v_i - 1);
__pyx_t_12 = __pyx_v_j;
__pyx_t_13 = (__pyx_v_i + 1);
__pyx_t_14 = __pyx_v_j;
__pyx_t_15 = __pyx_v_i;
__pyx_t_16 = (__pyx_v_j - 1);
__pyx_t_17 = __pyx_v_i;
__pyx_t_18 = (__pyx_v_j + 1);
__pyx_t_19 = __pyx_v_i;
__pyx_t_20 = __pyx_v_j;
__pyx_t_21 = (__pyx_v_i - 1);
__pyx_t_22 = (__pyx_v_j - 1);
*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf, __pyx_t_21, __pyx_pybuffernd_laplacian.diminfo.strides, __pyx_t_22, __pyx_pybuffernd_laplacian.diminfo.strides) = (((((*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_12, __pyx_pybuffernd_image.diminfo.strides)) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_14, __pyx_pybuffernd_image.diminfo.strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_15, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_16, __pyx_pybuffernd_image.diminfo.strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_17, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_18, __pyx_pybuffernd_image.diminfo.strides))) - (4.0 * (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_19, __pyx_pybuffernd_image.diminfo.strides, __pyx_t_20, __pyx_pybuffernd_image.diminfo.strides))));
}
}

+17:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_4);
__pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
__pyx_t_4 = NULL;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
__pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
if (likely(__pyx_t_4)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
__Pyx_INCREF(__pyx_t_4);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_3, function);
}
}
if (!__pyx_t_4) {
__pyx_t_5 = __Pyx_PyObject_CallOneArg(__pyx_t_3, ((PyObject *)__pyx_v_laplacian)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
} else {
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
__pyx_t_5 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_GOTREF(__pyx_t_5);
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
PyObject *__pyx_temp = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
__pyx_t_5 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
__Pyx_GOTREF(__pyx_t_5);
} else
#endif
{
__pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
__Pyx_INCREF(((PyObject *)__pyx_v_laplacian));
__Pyx_GIVEREF(((PyObject *)__pyx_v_laplacian));
PyTuple_SET_ITEM(__pyx_t_2, 0+1, ((PyObject *)__pyx_v_laplacian));
__pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
}
}
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
__pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_v_thresh = __pyx_t_3;
__pyx_t_3 = 0;

+18:     return thresh
  __Pyx_XDECREF(__pyx_r);
__Pyx_INCREF(__pyx_v_thresh);
__pyx_r = __pyx_v_thresh;
goto __pyx_L0;

In :
%timeit laplace_cython(image)

1.58 ms ± 32.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In :
np.allclose(laplace_numpy(image), laplace_cython(image))

Out:
True

What we can notice in the above function is that we have gotten rid completely of interaction in the for loop, which leads to a 4x increase in performance over our NumPy implementation.

Let's now turn to pythran.

# Pythran¶

One of the advantages of pythran is that it can optimize higher level numpy code than Cython. Luckily for us again, it features a line magic, so we keep working in the notebook.

In :
%load_ext pythran.magic


To use pythran, all you have to do is to annotate the function you want to export and give it a signature. This is done in a comment line starting the pythran file. One of the promises of pythran is that it can often handle high level broadcasting with NumPy and still optimize our function.

Annotating our NumPy function, we obtain this:

In :
%%pythran
#pythran export laplace_pythran_highlevel(float[][])
import numpy as np
def laplace_pythran_highlevel(image):
"""Laplace operator in NumPy for 2D images. Pythran accelerated."""
laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
thresh = np.abs(laplacian) > 0.05
return thresh

In :
np.allclose(laplace_numpy(image), laplace_pythran_highlevel(image))

Out:
True
In :
%timeit laplace_pythran_highlevel(image)

368 µs ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Wow! Pythran translated that to the fastest version so far!

But pythran can also translate "low level style" code, just like Cython. Let's try again in a low-level style.

In :
%%pythran
#pythran export laplace_pythran_lowlevel(float[][])
import numpy as np
def laplace_pythran_lowlevel(image):
"""Laplace operator in NumPy for 2D images. Pythran accelerated."""
h = image.shape
w = image.shape
laplacian = np.empty((h - 2, w - 2))
for i in range(1, h - 1):
for j in range(1, w - 1):
laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
thresh = np.abs(laplacian) > 0.05
return thresh

In :
np.allclose(laplace_numpy(image), laplace_pythran_lowlevel(image))

Out:
True
In :
%timeit laplace_pythran_lowlevel(image)

795 µs ± 8.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Less fast than before. What if we move the thresholding operation inside the main loop?

In :
%%pythran
#pythran export laplace_pythran_lowlevel(float[][])
import numpy as np
def laplace_pythran_lowlevel(image):
"""Laplace operator in NumPy for 2D images. Pythran accelerated."""
h = image.shape
w = image.shape
laplacian = np.empty((h - 2, w - 2))
for i in range(1, h - 1):
for j in range(1, w - 1):
laplacian[i-1, j-1] = np.abs(image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]) > 0.05
return laplacian

In :
np.allclose(laplace_numpy(image), laplace_pythran_lowlevel(image))

Out:
True
In :
%timeit laplace_pythran_lowlevel(image)

577 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


So in this case, pythran with a low level approach outperforms Cython, with the best result being when it's given high-level Numpy code.

Another interesting point here, concerning debugging pythran compilation, is that pythran annotated code is still perfectly valid Python. This helps during the debugging phase, compared to the previously explained Cython approach.

Let's turn to our last tool, numba.

# Numba¶

Numba is very similar to pythran, except that it does just-in-time compilation (JIT). We have several options to for use. Let's try the standard jit approach with a decorator.

In :
from numba import jit

In :
@jit
def laplace_numba(image):
"""Laplace operator in NumPy for 2D images. Accelerated using numba."""
laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
thresh = np.abs(laplacian) > 0.05
return thresh


We call it once to avoid timing issues due to JIT compilation.

In :
laplace_numba(image);

In :
%timeit laplace_numba(image)

2.4 ms ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


This is rightaway faster than NumPy.

We can also tell numba to avoid using Python at all in its computation. This is called with @jit(nopython=True).

In :
@jit(nopython=True)
def laplace_numba(image):
"""Laplace operator in NumPy for 2D images. Accelerated using numba."""
laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
thresh = np.abs(laplacian) > 0.05
return thresh

In :
laplace_numba(image);

In :
%timeit laplace_numba(image)

2.37 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


No big effect. In fact, we can infer from this that numba managed to generate pure C code from our function and that it did it already previously.

To confirm that, we can either use the inspect_types() method of our jitted function or use the annotation tool provided with numba.

In :
laplace_numba.inspect_types()

laplace_numba (array(float64, 2d, C),)
--------------------------------------------------------------------------------
# File:
# --- LINE 1 ---
# label 0
#   del $const0.3 # del$const0.2
#   del $0.4 # del$const0.7
#   del $const0.6 # del$0.8
#   del $0.9 # del$0.5
#   del $0.10 # del$const0.14
#   del $const0.13 # del$0.15
#   del $const0.18 # del$const0.17
#   del $0.19 # del$0.20
#   del $0.16 # del$0.21
#   del $const0.26 # del$const0.25
#   del $0.27 # del$const0.30
#   del $const0.29 # del$0.31
#   del $0.32 # del$0.28
#   del $0.33 # del$const0.38
#   del $const0.37 # del$0.39
#   del $const0.42 # del$const0.41
#   del $0.43 # del$0.44
#   del $0.40 # del$0.45
#   del $const0.51 # del$const0.50
#   del $0.52 # del$const0.55
#   del $const0.54 # del$0.56
#   del $0.57 # del$0.53
#   del image
#   del $0.58 # del$const0.48
#   del $0.11 # del$0.22
#   del $0.34 # del$0.46
#   del $0.59 # del$0.61
#   del $0.62 # del$0.63
#   del laplacian
#   del $const0.66 # del$0.67
#   del thresh

@jit(nopython=True)

# --- LINE 2 ---

def laplace_numba(image):

# --- LINE 3 ---

"""Laplace operator in NumPy for 2D images. Accelerated using numba."""

# --- LINE 4 ---
#   image = arg(0, name=image)  :: array(float64, 2d, C)
#   $const0.2 = const(NoneType, None) :: none #$const0.3 = const(int, -2)  :: int64
#   $0.4 = global(slice: ) :: Function() #$0.5 = call $0.4($const0.2, $const0.3, func=$0.4, args=(Var($const0.2, (4)), Var($const0.3,  (4))), kws=(), vararg=None)  :: (none, int64) -> slice
#   $const0.6 = const(int, 1) :: int64 #$const0.7 = const(int, -1)  :: int64
#   $0.8 = global(slice: ) :: Function() #$0.9 = call $0.8($const0.6, $const0.7, func=$0.8, args=(Var($const0.6, (4)), Var($const0.7,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
#   $0.10 = build_tuple(items=[Var($0.5,  (4)), Var($0.9, (4))]) :: (slice x 2) #$0.11 = static_getitem(value=image, index=(slice(None, -2, None), slice(1, -1, None)), index_var=$0.10) :: array(float64, 2d, A) #$const0.13 = const(int, 2)  :: int64
#   $const0.14 = const(NoneType, None) :: none #$0.15 = global(slice: )  :: Function()
#   $0.16 = call$0.15($const0.13,$const0.14, func=$0.15, args=(Var($const0.13,  (4)), Var($const0.14, (4))), kws=(), vararg=None) :: (int64, none) -> slice #$const0.17 = const(int, 1)  :: int64
#   $const0.18 = const(int, -1) :: int64 #$0.19 = global(slice: )  :: Function()
#   $0.20 = call$0.19($const0.17,$const0.18, func=$0.19, args=(Var($const0.17,  (4)), Var($const0.18, (4))), kws=(), vararg=None) :: (int64, int64) -> slice #$0.21 = build_tuple(items=[Var($0.16, (4)), Var($0.20,  (4))])  :: (slice x 2)
#   $0.22 = static_getitem(value=image, index=(slice(2, None, None), slice(1, -1, None)), index_var=$0.21)  :: array(float64, 2d, A)
#   $const0.25 = const(int, 1) :: int64 #$const0.26 = const(int, -1)  :: int64
#   $0.27 = global(slice: ) :: Function() #$0.28 = call $0.27($const0.25, $const0.26, func=$0.27, args=(Var($const0.25, (4)), Var($const0.26,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
#   $const0.29 = const(NoneType, None) :: none #$const0.30 = const(int, -2)  :: int64
#   $0.31 = global(slice: ) :: Function() #$0.32 = call $0.31($const0.29, $const0.30, func=$0.31, args=(Var($const0.29, (4)), Var($const0.30,  (4))), kws=(), vararg=None)  :: (none, int64) -> slice
#   $0.33 = build_tuple(items=[Var($0.28,  (4)), Var($0.32, (4))]) :: (slice x 2) #$0.34 = static_getitem(value=image, index=(slice(1, -1, None), slice(None, -2, None)), index_var=$0.33) :: array(float64, 2d, A) #$const0.37 = const(int, 1)  :: int64
#   $const0.38 = const(int, -1) :: int64 #$0.39 = global(slice: )  :: Function()
#   $0.40 = call$0.39($const0.37,$const0.38, func=$0.39, args=(Var($const0.37,  (4)), Var($const0.38, (4))), kws=(), vararg=None) :: (int64, int64) -> slice #$const0.41 = const(int, 2)  :: int64
#   $const0.42 = const(NoneType, None) :: none #$0.43 = global(slice: )  :: Function()
#   $0.44 = call$0.43($const0.41,$const0.42, func=$0.43, args=(Var($const0.41,  (4)), Var($const0.42, (4))), kws=(), vararg=None) :: (int64, none) -> slice #$0.45 = build_tuple(items=[Var($0.40, (4)), Var($0.44,  (4))])  :: (slice x 2)
#   $0.46 = static_getitem(value=image, index=(slice(1, -1, None), slice(2, None, None)), index_var=$0.45)  :: array(float64, 2d, A)
#   $const0.48 = const(int, 4) :: int64 #$const0.50 = const(int, 1)  :: int64
#   $const0.51 = const(int, -1) :: int64 #$0.52 = global(slice: )  :: Function()
#   $0.53 = call$0.52($const0.50,$const0.51, func=$0.52, args=(Var($const0.50,  (4)), Var($const0.51, (4))), kws=(), vararg=None) :: (int64, int64) -> slice #$const0.54 = const(int, 1)  :: int64
#   $const0.55 = const(int, -1) :: int64 #$0.56 = global(slice: )  :: Function()
#   $0.57 = call$0.56($const0.54,$const0.55, func=$0.56, args=(Var($const0.54,  (4)), Var($const0.55, (4))), kws=(), vararg=None) :: (int64, int64) -> slice #$0.58 = build_tuple(items=[Var($0.53, (4)), Var($0.57,  (4))])  :: (slice x 2)
#   $0.59 = static_getitem(value=image, index=(slice(1, -1, None), slice(1, -1, None)), index_var=$0.58)  :: array(float64, 2d, A)
#   $0.61 = arrayexpr(expr=('-', [('+', [('+', [('+', [Var($0.11,  (4)), Var($0.22, (4))]), Var($0.34,  (4))]), Var($0.46, (4))]), ('*', [const(int, 4), Var($0.59,  (4))])]), ty=array(float64, 2d, C))  :: array(float64, 2d, C)
#   laplacian = $0.61 :: array(float64, 2d, C) laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1] # --- LINE 5 --- #$0.62 = global(np: )  :: Module()
#   $0.63 = getattr(value=$0.62, attr=abs)  :: Function()
#   $const0.66 = const(float, 0.05) :: float64 #$0.67 = arrayexpr(expr=('>', [(, [Var(laplacian,  (4))]), const(float, 0.05)]), ty=array(bool, 2d, C))  :: array(bool, 2d, C)
#   thresh = $0.67 :: array(bool, 2d, C) thresh = np.abs(laplacian) > 0.05 # --- LINE 6 --- #$0.69 = cast(value=thresh)  :: array(bool, 2d, C)
#   return $0.69 return thresh ================================================================================  Admittedly, not very helpful. Let's try the annotation tool. In : %%file numba_annotation_demo.py import numpy as np import skimage.data import skimage.color from numba import jit @jit(nopython=True) def laplace_numba(image): """Laplace operator in NumPy for 2D images. Accelerated using numba.""" laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1] thresh = np.abs(laplacian) > 0.05 return thresh image = skimage.data.astronaut() image = skimage.color.rgb2gray(image) laplace_numba(image)  Overwriting numba_annotation_demo.py  We have to run the annotation tool from the command line: In : !numba --annotate-html annotation.html numba_annotation_demo.py  In : from IPython.display import HTML  In : HTML(data=open('annotation.html').read())  Out:  6: @jit(nopython=True) label 0 del$const0.3 del $const0.2 del$0.4 del $const0.7 del$const0.6 del $0.8 del$0.9 del $0.5 del$0.10 del $const0.14 del$const0.13 del $0.15 del$const0.18 del $const0.17 del$0.19 del $0.20 del$0.16 del $0.21 del$0.22 del $0.11 del$const0.26 del $const0.25 del$0.27 del $const0.30 del$const0.29 del $0.31 del$0.32 del $0.28 del$0.33 del $0.34 del$0.23 del $const0.38 del$const0.37 del $0.39 del$const0.42 del $const0.41 del$0.43 del $0.44 del$0.40 del $0.45 del$0.46 del $0.35 del$const0.51 del $const0.50 del$0.52 del $const0.55 del$const0.54 del $0.56 del$0.57 del $0.53 del image del$0.58 del $const0.48 del$0.59 del $0.60 del$0.47 del $0.61 del$0.62 del laplacian del $0.63 del$const0.66 del $0.65 del$0.67 del thresh 7: def laplace_numba(image): 8: """Laplace operator in NumPy for 2D images. Accelerated using numba.""" 9: laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1] image = arg(0, name=image) :: array(float64, 2d, C) $const0.2 = const(NoneType, None) :: none$const0.3 = const(int, -2) :: int64 $0.4 = global(slice: ) :: Function()$0.5 = call $0.4($const0.2, $const0.3, func=$0.4, args=(Var($const0.2, numba_annotation_demo.py (9)), Var($const0.3, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (none, int64) -> slice $const0.6 = const(int, 1) :: int64$const0.7 = const(int, -1) :: int64 $0.8 = global(slice: ) :: Function()$0.9 = call $0.8($const0.6, $const0.7, func=$0.8, args=(Var($const0.6, numba_annotation_demo.py (9)), Var($const0.7, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, int64) -> slice $0.10 = build_tuple(items=[Var($0.5, numba_annotation_demo.py (9)), Var($0.9, numba_annotation_demo.py (9))]) :: (slice x 2)$0.11 = static_getitem(value=image, index=(slice(None, -2, None), slice(1, -1, None)), index_var=$0.10) :: array(float64, 2d, A)$const0.13 = const(int, 2) :: int64 $const0.14 = const(NoneType, None) :: none$0.15 = global(slice: ) :: Function() $0.16 = call$0.15($const0.13,$const0.14, func=$0.15, args=(Var($const0.13, numba_annotation_demo.py (9)), Var($const0.14, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, none) -> slice$const0.17 = const(int, 1) :: int64 $const0.18 = const(int, -1) :: int64$0.19 = global(slice: ) :: Function() $0.20 = call$0.19($const0.17,$const0.18, func=$0.19, args=(Var($const0.17, numba_annotation_demo.py (9)), Var($const0.18, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, int64) -> slice$0.21 = build_tuple(items=[Var($0.16, numba_annotation_demo.py (9)), Var($0.20, numba_annotation_demo.py (9))]) :: (slice x 2) $0.22 = static_getitem(value=image, index=(slice(2, None, None), slice(1, -1, None)), index_var=$0.21) :: array(float64, 2d, A) $0.23 =$0.11 + $0.22 :: array(float64, 2d, C)$const0.25 = const(int, 1) :: int64 $const0.26 = const(int, -1) :: int64$0.27 = global(slice: ) :: Function() $0.28 = call$0.27($const0.25,$const0.26, func=$0.27, args=(Var($const0.25, numba_annotation_demo.py (9)), Var($const0.26, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, int64) -> slice$const0.29 = const(NoneType, None) :: none $const0.30 = const(int, -2) :: int64$0.31 = global(slice: ) :: Function() $0.32 = call$0.31($const0.29,$const0.30, func=$0.31, args=(Var($const0.29, numba_annotation_demo.py (9)), Var($const0.30, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (none, int64) -> slice$0.33 = build_tuple(items=[Var($0.28, numba_annotation_demo.py (9)), Var($0.32, numba_annotation_demo.py (9))]) :: (slice x 2) $0.34 = static_getitem(value=image, index=(slice(1, -1, None), slice(None, -2, None)), index_var=$0.33) :: array(float64, 2d, A) $0.35 =$0.23 + $0.34 :: array(float64, 2d, C)$const0.37 = const(int, 1) :: int64 $const0.38 = const(int, -1) :: int64$0.39 = global(slice: ) :: Function() $0.40 = call$0.39($const0.37,$const0.38, func=$0.39, args=(Var($const0.37, numba_annotation_demo.py (9)), Var($const0.38, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, int64) -> slice$const0.41 = const(int, 2) :: int64 $const0.42 = const(NoneType, None) :: none$0.43 = global(slice: ) :: Function() $0.44 = call$0.43($const0.41,$const0.42, func=$0.43, args=(Var($const0.41, numba_annotation_demo.py (9)), Var($const0.42, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, none) -> slice$0.45 = build_tuple(items=[Var($0.40, numba_annotation_demo.py (9)), Var($0.44, numba_annotation_demo.py (9))]) :: (slice x 2) $0.46 = static_getitem(value=image, index=(slice(1, -1, None), slice(2, None, None)), index_var=$0.45) :: array(float64, 2d, A) $0.47 =$0.35 + $0.46 :: array(float64, 2d, C)$const0.48 = const(int, 4) :: int64 $const0.50 = const(int, 1) :: int64$const0.51 = const(int, -1) :: int64 $0.52 = global(slice: ) :: Function()$0.53 = call $0.52($const0.50, $const0.51, func=$0.52, args=(Var($const0.50, numba_annotation_demo.py (9)), Var($const0.51, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, int64) -> slice $const0.54 = const(int, 1) :: int64$const0.55 = const(int, -1) :: int64 $0.56 = global(slice: ) :: Function()$0.57 = call $0.56($const0.54, $const0.55, func=$0.56, args=(Var($const0.54, numba_annotation_demo.py (9)), Var($const0.55, numba_annotation_demo.py (9))), kws=(), vararg=None) :: (int64, int64) -> slice $0.58 = build_tuple(items=[Var($0.53, numba_annotation_demo.py (9)), Var($0.57, numba_annotation_demo.py (9))]) :: (slice x 2)$0.59 = static_getitem(value=image, index=(slice(1, -1, None), slice(1, -1, None)), index_var=$0.58) :: array(float64, 2d, A)$0.60 = $const0.48 *$0.59 :: array(float64, 2d, C) $0.61 =$0.47 - $0.60 :: array(float64, 2d, C) laplacian =$0.61 :: array(float64, 2d, C) 10: thresh = np.abs(laplacian) > 0.05 $0.62 = global(np: ) :: Module()$0.63 = getattr(value=$0.62, attr=abs) :: Function()$0.65 = call $0.63(laplacian, func=$0.63, args=[Var(laplacian, numba_annotation_demo.py (9))], kws=(), vararg=None) :: (array(float64, 2d, C),) -> array(float64, 2d, C) $const0.66 = const(float, 0.05) :: float64$0.67 = $0.65 >$const0.66 :: array(bool, 2d, C) thresh = $0.67 :: array(bool, 2d, C) 11: return thresh$0.69 = cast(value=thresh) :: array(bool, 2d, C) return $0.69  1: import numpy as np label 0 del _0_22 del _0_11 del _0_34 del$0.3 del _0_46 del $0.5 del _0_59 del$const0.8 del $0.7 del$0.10 del $0.11 2: import skimage.data 3: import skimage.color 4: from numba import jit 5: 6: @jit(nopython=True) 7: def laplace_numba(image): 8: """Laplace operator in NumPy for 2D images. Accelerated using numba.""" 9: laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1] _0_11 = arg(0, name=_0_11) :: float64 _0_22 = arg(1, name=_0_22) :: float64 _0_34 = arg(2, name=_0_34) :: float64 _0_46 = arg(3, name=_0_46) :: float64 _0_59 = arg(4, name=_0_59) :: float64$0.3 = _0_11 + _0_22 :: float64 $0.5 =$0.3 + _0_34 :: float64 $0.7 =$0.5 + _0_46 :: float64 $const0.8 = const(int, 4) :: int64$0.10 = $const0.8 * _0_59 :: float64$0.11 = $0.7 -$0.10 :: float64 $0.12 = cast(value=$0.11) :: float64 return $0.12 10: thresh = np.abs(laplacian) > 0.05 11: return thresh  176: def _broadcast_onto(src_ndim, src_shape, dest_ndim, dest_shape): label 0 177: '''Low-level utility function used in calculating a shape for 178: an implicit output array. This function assumes that the 179: destination shape is an LLVM pointer to a C-style array that was 180: already initialized to a size of one along all axes. 181: 182: Returns an integer value: 183: >= 1 : Succeeded. Return value should equal the number of dimensions in 184: the destination shape. 185: 0 : Failed to broadcast because source shape is larger than the 186: destination shape (this case should be weeded out at type 187: checking). 188: < 0 : Failed to broadcast onto destination axis, at axis number == 189: -(return_value + 1). 190: ''' 191: if src_ndim > dest_ndim: src_ndim = arg(0, name=src_ndim) :: int64 src_shape = arg(1, name=src_shape) :: int64* dest_ndim = arg(2, name=dest_ndim) :: int64 dest_shape = arg(3, name=dest_shape) :: int64*$0.3 = src_ndim > dest_ndim :: bool branch $0.3, 8, 12 label 8 del src_shape del src_ndim del dest_shape del dest_ndim del$0.3 del $const8.1 192: # This check should have been done during type checking, but 193: # let's be defensive anyway... 194: return 0$const8.1 = const(int, 0) :: int64 $8.2 = cast(value=$const8.1) :: int64 return $8.2 label 12 del$0.3 del $const12.1 del dest_ndim del$12.4 195: else: 196: src_index = 0 $const12.1 = const(int, 0) :: int64 src_index =$const12.1 :: int64 197: dest_index = dest_ndim - src_ndim $12.4 = dest_ndim - src_ndim :: int64 dest_index =$12.4 :: int64 jump 24 label 24 198: while src_index < src_ndim: jump 26 label 26 $26.3 = src_index < src_ndim :: bool branch$26.3, 34, 120 label 34 del $26.3 del$34.3 del $34.6 del$const34.8 199: src_dim_size = src_shape[src_index] $34.3 = getitem(value=src_shape, index=src_index) :: int64 src_dim_size =$34.3 :: int64 200: dest_dim_size = dest_shape[dest_index] $34.6 = getitem(value=dest_shape, index=dest_index) :: int64 dest_dim_size =$34.6 :: int64 201: # Check to see if we've already mutated the destination 202: # shape along this axis. 203: if dest_dim_size != 1: $const34.8 = const(int, 1) :: int64$34.9 = dest_dim_size != $const34.8 :: bool branch$34.9, 58, 86 label 58 del $34.9 del dest_dim_size 204: # If we have mutated the destination shape already, 205: # then the source axis size must either be one, 206: # or the destination axis size. 207: if src_dim_size != dest_dim_size and src_dim_size != 1:$58.3 = src_dim_size != dest_dim_size :: bool branch $58.3, 66, 102 label 66 del$58.3 $const66.2 = const(int, 1) :: int64$66.3 = src_dim_size != $const66.2 :: bool del src_dim_size del$const66.2 branch $66.3, 74, 102 label 74 del src_shape del src_ndim del src_index del dest_shape del$66.3 del dest_index del $const74.2 del$74.3 del $74.4 208: return -(dest_index + 1)$const74.2 = const(int, 1) :: int64 $74.3 = dest_index +$const74.2 :: int64 $74.4 = unary(fn=-, value=$74.3) :: int64 $74.5 = cast(value=$74.4) :: int64 return $74.5 label 86 del dest_dim_size del$34.9 del $const86.2 209: elif src_dim_size != 1:$const86.2 = const(int, 1) :: int64 $86.3 = src_dim_size !=$const86.2 :: bool branch $86.3, 94, 102 label 94 del$86.3 del src_dim_size 210: # If the destination size is still its initial 211: dest_shape[dest_index] = src_dim_size dest_shape[dest_index] = src_dim_size :: (int64*, int64, int64) -> none jump 102 label 102 del src_dim_size del $86.3 del$66.3 del $58.3 del$const102.2 del $102.3 del$const102.5 del $102.6 212: src_index += 1$const102.2 = const(int, 1) :: int64 $102.3 = inplace_binop(fn=+=, immutable_fn=+, lhs=src_index, rhs=$const102.2, static_lhs=, static_rhs=) :: int64 src_index = $102.3 :: int64 213: dest_index += 1$const102.5 = const(int, 1) :: int64 $102.6 = inplace_binop(fn=+=, immutable_fn=+, lhs=dest_index, rhs=$const102.5, static_lhs=, static_rhs=) :: int64 dest_index = $102.6 :: int64 jump 26 label 120 del src_shape del src_ndim del src_index del dest_shape del$26.3 jump 122 label 122 del dest_index 214: return dest_index $122.2 = cast(value=dest_index) :: int64 return$122.2

 1: import numpy as np label 0 laplacian = arg(0, name=laplacian) :: float64 $0.1 = global(__ufunc_or_dufunc_0x7fa0b0d043e: ) :: Function() del laplacian del$0.1 del $const0.4 del$0.3 del $0.5 2: import skimage.data 3: import skimage.color 4: from numba import jit 5: 6: @jit(nopython=True) 7: def laplace_numba(image): 8: """Laplace operator in NumPy for 2D images. Accelerated using numba.""" 9: laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]$0.3 = call $0.1(laplacian, func=$0.1, args=[Var(laplacian, numba_annotation_demo.py (1))], kws=(), vararg=None) :: (float64,) -> float64 $const0.4 = const(float, 0.05) :: float64$0.5 = $0.3 >$const0.4 :: bool $0.6 = cast(value=$0.5) :: bool return \$0.6 10: thresh = np.abs(laplacian) > 0.05 11: return thresh

What this shows is that numba managed to correctly optimize the broadcasting (if it hadn't we would have seen colored source code lines).

Let's see also investigate how numba optimizes the low-level approach with explicit for loops.

In :
@jit(nopython=True)
def laplace_numba(image):
"""Laplace operator for 2D images. Numba accelerated."""
h = image.shape
w = image.shape
laplacian = np.empty((h - 2, w - 2))
for i in range(1, h - 1):
for j in range(1, w - 1):
laplacian[i-1, j-1] = np.abs(image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]) > 0.05
return laplacian

In :
laplace_numba(image);

In :
%timeit laplace_numba(image)

848 µs ± 191 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


This approach improves the performance!

Another way of approaching optimization with numba is to thing in terms of NumPy ufuncs. You can write generalized ufuncs with numba using another decorator, @guvectorize. However, this comes at the cost that we have to think in terms of an ufunc and thus we have to provide an array that gets filled by our computation routine.

In :
from numba import guvectorize

In :
@guvectorize('void(float64[:, :], float64[:, :])', "(m, n)->(m, n)")
def laplace_numba_guvectorize(image, laplacian):
"""Laplace operator in NumPy for 2D images. Numba accelerated."""
h = image.shape
w = image.shape
for i in range(1, h - 1):
for j in range(1, w - 1):
laplacian[i-1, j-1] = np.abs(4 * image[i, j] - image[i - 1, j] - image[i + 1, j] - image[i, j + 1] - image[i, j - 1]) > 0.05

In :
laplacian = np.empty_like(image)

In :
laplace_numba_guvectorize(image, laplacian);

In :
np.allclose(laplace_numpy(image), laplacian[:-2, :-2])

Out:
True
In :
%timeit laplace_numba_guvectorize(image, laplacian);

697 µs ± 150 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


This approach is even faster. However, it has some limitations: the signature for array sizes forces us to use the same shapes in the input and output. I could'nt use the "(m, n)->(m-1, n-1)" signature. Note that we also could have target parallel execution by using the target=parallel option in the decorator.

# Wrap-up and plots¶

Time for a wrap-up and some plots.

In :
timings = {}
for func in [laplace_skimage, laplace_numpy, laplace_cython, laplace_pythran_highlevel, laplace_pythran_lowlevel, laplace_numba]:
t = %timeit -o func(image)
timings[func.__name__] = t

4.62 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.56 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.65 ms ± 57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
354 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
578 µs ± 5.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
635 µs ± 5.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In :
t = %timeit -o laplace_numba_guvectorize(image, laplacian);
timings['laplace_numba_guvectorize'] = t

519 µs ± 8.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Here are the table and the plots.

In :
import pandas as pd

In :
pd.Series({key: timings[key].average * 1e6 for key in timings}).to_frame(name='timings (μs)').sort_values(by='timings (μs)')

Out:
timings (μs)
laplace_pythran_highlevel 354.292226
laplace_numba_guvectorize 518.729539
laplace_pythran_lowlevel 578.317114
laplace_numba 635.353504
laplace_cython 1646.003612
laplace_numpy 3564.326801
laplace_skimage 4624.369004
In :
fig, ax = plt.subplots(figsize=(10, 10))
pd.Series({key: timings[key].average * 1e6 for key in timings}).to_frame(name='timings (μs)').sort_values(by='timings (μs)').plot(kind='barh', ax=ax)

Out: conclusions

So, what have I learned during these last three days? Here are some thoughts:

• most approaches suggest you rewrite your function in a low-level style if you want a good optimization result
• most approaches rely on the type annotations which you supply and which are crucial for good results (and sometimes it's easy to forget one variable which in turn prevents the optimizer from doing its best)
• pythran, like numba, can in theory optimize high-level NumPy code (in our example, pythran did better than numba)
• pythran yields the best performance in our benchmark
• Cython, contrary to what I expected, comes in after those two (but still yields a good speed-up)
• debugging all code generators is usually frustrating, due to difficult to understand error messages, low-level stuff (try reading the Cython output by clicking the yellow lines)
• numba also allows targeting GPUs, something I haven't spoken about here since I don't have a GPU on my MacBook (this was the topic I had the most difficulty with during the training session since I've never worked with GPUs)