Optimizing your code with NumPy, Cython, pythran and numba

I had the pleasure of attending a workshop given by the groupe calcul (CNRS) this week. The topic was: how do you optimize the execution speed of your Python code, under the hypothesis that you already tried to make it fast using NumPy?

The training was held over three days and presented three interesting ways to achieve speedups: Cython, pythran and numba.

Overall, the workshop was great. The goal of this blog post is to summarize some of the key insights that I learnt while using these three tools on an practical application: image filtering.

Introducing the discrete Laplacian

When working with images, the discrete Laplacian operator is often used for edge detection.

Intuitively, if you want to find the edges of an image, you compute the Laplacian and threshold it to see the edges appear. Let's give an example using scikit-image.

First, let's load a standard grayscale image that ships with scikit-image, the astronaut.

In [1]:
import skimage.data
import skimage.color
In [2]:
image = skimage.data.astronaut()
image = skimage.color.rgb2gray(image)
image.shape
Out[2]:
(512, 512)

Now, let's write a function that uses the scikit-image implementation of the discrete Laplace operator and thresholds the result with a fixed threshold.

In [3]:
from skimage.filters import laplace
import numpy as np
In [4]:
def laplace_skimage(image):
    """Applies Laplace operator to 2D image using skimage implementation. 
    Then tresholds the result and returns boolean image."""
    laplacian = laplace(image)
    thresh = np.abs(laplacian) > 0.05
    return thresh

Let's apply the function and check its result.

In [5]:
edges = laplace_skimage(image)
In [6]:
edges.shape
Out[6]:
(512, 512)

Let's display the obtained edges, along with the original image.

In [7]:
import matplotlib.pyplot as plt
%matplotlib inline
In [8]:
def compare(left, right):
    """Compares two images, left and right."""
    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
    ax[0].imshow(left, cmap='gray')
    ax[1].imshow(right, cmap='gray')
In [9]:
compare(left=image, right=edges)

Since the topic of this post is to deal with optimization, we will now suppose that we're not using scikit-image.

As is usually the case when you're working on your own problems, we'll implement the Laplacian algorithm by ourselves using NumPy.

NumPy implementation

So here we go: we write a function that makes use of vectorized NumPy to perform the same operation than the scikit-image implementation.

In [10]:
def laplace_numpy(image):
    """Applies Laplace operator to 2D image using our own NumPy implementation. 
    Then tresholds the result and returns boolean image."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh

If you're paying attention closely, we're not actually implementing exactly the same function as scikit-image, since we're excluding all border pixels of the image from our calculation.

Let's check the result is qualitatively correct.

In [11]:
laplace_numpy(image).shape
Out[11]:
(510, 510)
In [12]:
compare(image, laplace_numpy(image))

And let's also check that our implementation quantitavely agrees with scikit-image on every interior pixel.

In [13]:
np.allclose(laplace_skimage(image)[1:-1, 1:-1], laplace_numpy(image))
Out[13]:
True

Good, we're now all set for the next step. Let's pretend that we think our implementation is slow. How do we objectively measure that? And then: how do we make it faster?

Profiling and timing

So, we think our NumPy version is slow. How can we put a number on that feeling? The first idea is to time the execution on our test image. We can easily do this with the %time line magic.

In [14]:
%time laplace_numpy(image)
CPU times: user 6.26 ms, sys: 2.47 ms, total: 8.73 ms
Wall time: 8.49 ms
Out[14]:
array([[False,  True,  True, ..., False, False, False],
       [False,  True, False, ..., False, False, False],
       [ True,  True, False, ..., False, False, False],
       ..., 
       [False, False, False, ...,  True, False, False],
       [False, False, False, ...,  True, False, False],
       [False, False, False, ...,  True,  True, False]], dtype=bool)

Mmmh, around 5 milliseconds per call. If we run this multiple times, the results slightly vary.

Therefore, it is better to use the %timeit line magic, which averages the timing operation and gives us a more accurate measurement.

In [15]:
%timeit laplace_numpy(image)
5.66 ms ± 727 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How does the scikit-image implementation compare to that?

In [16]:
%timeit laplace_skimage(image)
4.99 ms ± 404 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Approximately the same, albeit a little slower.

So, our function is too slow for say, our realtime application. To make it go faster, we first need to have an idea about what takes time in it. Our function works in two steps: first we compute the laplacian and then we threshold the results. Which part of the function is taking the most time?

To answer that question, we use a profiler. Many different profilers exit. We'll look at three below: %prun, %lprun and pprofile.

Using %prun gives us a high level view of what's happening inside our call.

In [17]:
r = %prun -r laplace_numpy(image)
r.print_stats()
          4 function calls in 0.007 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.007    0.007    0.007    0.007 <ipython-input-10-d2caf1f24bf1>:1(laplace_numpy)
        1    0.001    0.001    0.007    0.007 <string>:1(<module>)
        1    0.000    0.000    0.007    0.007 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Out[17]:
<pstats.Stats at 0x118d689b0>

This is not very informative. But we understand that most of the time (column tottime) is spent inside our laplace_numpy function.

What about %lprun?

In [18]:
%load_ext line_profiler
In [19]:
r = %lprun -r -f laplace_numpy laplace_numpy(image)
r.print_stats()
Timer unit: 1e-06 s

Total time: 0.005994 s
File: <ipython-input-10-d2caf1f24bf1>
Function: laplace_numpy at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def laplace_numpy(image):
     2                                               """Applies Laplace operator to 2D image using our own NumPy implementation. 
     3                                               Then tresholds the result and returns boolean image."""
     4         1         4757   4757.0     79.4      laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
     5         1         1236   1236.0     20.6      thresh = np.abs(laplacian) > 0.05
     6         1            1      1.0      0.0      return thresh

Running %lprun -r -f laplace_numpy laplace_numpy(image) means that we aske %lprun to show how long each line took to execute inside the function laplace_numpy and return the result as an object.

Here, we can answer the earlier question: computing the laplacian is what takes the most time in this function.

Let's see if pprofile confirms that measurement.

In [20]:
import pprofile
def func_to_profile():
    prof = pprofile.Profile()
    with prof():
        laplace_numpy(image)
    prof.print_stats()
In [21]:
func_to_profile()
Total duration: 0.00623107s
File: <ipython-input-10-d2caf1f24bf1>
File duration: 0.00590301s (94.74%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
     1|         1|  2.59876e-05|  2.59876e-05|  0.42%|def laplace_numpy(image):
     2|         0|            0|            0|  0.00%|    """Applies Laplace operator to 2D image using our own NumPy implementation.
     3|         0|            0|            0|  0.00%|    Then tresholds the result and returns boolean image."""
     4|         1|   0.00461984|   0.00461984| 74.14%|    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
     5|         1|   0.00123024|   0.00123024| 19.74%|    thresh = np.abs(laplacian) > 0.05
(call)|         1|   0.00590301|   0.00590301| 94.74%|# <ipython-input-10-d2caf1f24bf1>:1 laplace_numpy
     6|         1|  2.69413e-05|  2.69413e-05|  0.43%|    return thresh
/Users/kappamaki/anaconda/lib/python3.6/site-packages/pprofile.py:102: UserWarning: Cannot access "<ipykernel.iostream.OutStream object at 0x109f8f860>.buffer", invalid entities from source files will cause errors when annotating.
  'files will cause errors when annotating.' % (stream, )

pprofile confirms the previous assessment: it's the laplacian computation that takes time.

What's more, pprofile is a little bit simpler to use than line_profiler. However, it doesn't feature a cell magic (although there's already someone asking for this here).

At this point, we know that to make our function faster, the first thing to optimize is the computation of the laplacian. Let's try that using Cython.

Cython

What Cython does, like the other tools we'll see below, is to compile our Python code into C code that will hopefully be very fast. Since Cython has a so-called IPython magic function, we'll be able to continue working in the notebook. If we use the -a option with the Cython magic, we can obtain an annotated version of our source code that shows where Cython uses C code and where it uses Python objects.

In [22]:
%load_ext cython
In [23]:
%%cython -a
import numpy as np
   
def laplace_cython(image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
Out[23]:
Cython: _cython_magic_f49805f241626283bad2e1f40e0383d8.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+1: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 2: 
+3: def laplace_cython(image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_f49805f241626283bad2e1f40e0383d8_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_f49805f241626283bad2e1f40e0383d8_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_f49805f241626283bad2e1f40e0383d8_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_f49805f241626283bad2e1f40e0383d8_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_f49805f241626283bad2e1f40e0383d8_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_f49805f241626283bad2e1f40e0383d8_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_f49805f241626283bad2e1f40e0383d8_laplace_cython(__pyx_self, ((PyObject *)__pyx_v_image));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_f49805f241626283bad2e1f40e0383d8_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_v_laplacian = NULL;
  PyObject *__pyx_v_thresh = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("_cython_magic_f49805f241626283bad2e1f40e0383d8.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__16 = PyTuple_Pack(3, __pyx_n_s_image, __pyx_n_s_laplacian, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__16)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__16);
  __Pyx_GIVEREF(__pyx_tuple__16);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_f49805f241626283bad2e1f40e0383d8_1laplace_cython, NULL, __pyx_n_s_cython_magic_f49805f241626283ba); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 4:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 5:     Cython implementation."""
+6:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
  __pyx_slice_ = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice_)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice_);
  __Pyx_GIVEREF(__pyx_slice_);
  __pyx_slice__2 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__2);
  __Pyx_GIVEREF(__pyx_slice__2);
/* … */
  __pyx_t_1 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_tuple__3 = PyTuple_Pack(2, __pyx_slice_, __pyx_slice__2); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__3);
  __Pyx_GIVEREF(__pyx_tuple__3);
  __pyx_slice__4 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__4)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__4);
  __Pyx_GIVEREF(__pyx_slice__4);
  __pyx_slice__5 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__5)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__5);
  __Pyx_GIVEREF(__pyx_slice__5);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__6); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__6 = PyTuple_Pack(2, __pyx_slice__4, __pyx_slice__5); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__6);
  __Pyx_GIVEREF(__pyx_tuple__6);
  __pyx_slice__7 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__7)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__7);
  __Pyx_GIVEREF(__pyx_slice__7);
  __pyx_slice__8 = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice__8)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__8);
  __Pyx_GIVEREF(__pyx_slice__8);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__9); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Add(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__9 = PyTuple_Pack(2, __pyx_slice__7, __pyx_slice__8); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__9);
  __Pyx_GIVEREF(__pyx_tuple__9);
  __pyx_slice__10 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__10)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__10);
  __Pyx_GIVEREF(__pyx_slice__10);
  __pyx_slice__11 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__11)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__11);
  __Pyx_GIVEREF(__pyx_slice__11);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__12); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__12 = PyTuple_Pack(2, __pyx_slice__10, __pyx_slice__11); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__12);
  __Pyx_GIVEREF(__pyx_tuple__12);
  __pyx_slice__13 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__13)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__13);
  __Pyx_GIVEREF(__pyx_slice__13);
  __pyx_slice__14 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__14)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__14);
  __Pyx_GIVEREF(__pyx_slice__14);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__15); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = PyNumber_Subtract(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_laplacian = __pyx_t_2;
  __pyx_t_2 = 0;
  __pyx_tuple__15 = PyTuple_Pack(2, __pyx_slice__13, __pyx_slice__14); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__15);
  __Pyx_GIVEREF(__pyx_tuple__15);
+7:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_1 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_1)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_1);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_1) {
    __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    {
      __pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;
      __Pyx_INCREF(__pyx_v_laplacian);
      __Pyx_GIVEREF(__pyx_v_laplacian);
      PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_laplacian);
      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+8:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;

What the above code shows is that Cython couldn't optimize our function much, as indicated by the yellow hue of the lines above. The yellow intensity in each line shows the amount of interaction with Python objects that it found. The more you have to use Python objects, the less speedup you are likely to achieve.

Let's time this first compiled version.

In [24]:
%timeit laplace_cython(image)
4.49 ms ± 576 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It performs roughly at the same speed that our NumPy version. So there's no much use in switching to Cython. Can we do better? The key is to eliminate interaction with Python through different ways. A first one is to declare our input image is an array.

In [25]:
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
Out[25]:
Cython: _cython_magic_b136238142d96c48628896f2f415d3f0.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+1: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 2: cimport numpy as cnp
 3: 
+4: def laplace_cython(cnp.ndarray image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_b136238142d96c48628896f2f415d3f0_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_b136238142d96c48628896f2f415d3f0_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_b136238142d96c48628896f2f415d3f0_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_b136238142d96c48628896f2f415d3f0_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_b136238142d96c48628896f2f415d3f0_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_b136238142d96c48628896f2f415d3f0_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_b136238142d96c48628896f2f415d3f0_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_b136238142d96c48628896f2f415d3f0_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  PyObject *__pyx_v_laplacian = NULL;
  PyObject *__pyx_v_thresh = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("_cython_magic_b136238142d96c48628896f2f415d3f0.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__25 = PyTuple_Pack(3, __pyx_n_s_image, __pyx_n_s_laplacian, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__25)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__25);
  __Pyx_GIVEREF(__pyx_tuple__25);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_b136238142d96c48628896f2f415d3f0_1laplace_cython, NULL, __pyx_n_s_cython_magic_b136238142d96c4862); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 5:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 6:     Cython implementation."""
+7:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
  __pyx_slice_ = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice_)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice_);
  __Pyx_GIVEREF(__pyx_slice_);
  __pyx_slice__2 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__2);
  __Pyx_GIVEREF(__pyx_slice__2);
/* … */
  __pyx_t_1 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_tuple__3 = PyTuple_Pack(2, __pyx_slice_, __pyx_slice__2); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__3);
  __Pyx_GIVEREF(__pyx_tuple__3);
  __pyx_slice__4 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__4)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__4);
  __Pyx_GIVEREF(__pyx_slice__4);
  __pyx_slice__5 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__5)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__5);
  __Pyx_GIVEREF(__pyx_slice__5);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__6); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__6 = PyTuple_Pack(2, __pyx_slice__4, __pyx_slice__5); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__6);
  __Pyx_GIVEREF(__pyx_tuple__6);
  __pyx_slice__7 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__7)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__7);
  __Pyx_GIVEREF(__pyx_slice__7);
  __pyx_slice__8 = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice__8)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__8);
  __Pyx_GIVEREF(__pyx_slice__8);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__9); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Add(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__9 = PyTuple_Pack(2, __pyx_slice__7, __pyx_slice__8); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__9);
  __Pyx_GIVEREF(__pyx_tuple__9);
  __pyx_slice__10 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__10)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__10);
  __Pyx_GIVEREF(__pyx_slice__10);
  __pyx_slice__11 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__11)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__11);
  __Pyx_GIVEREF(__pyx_slice__11);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__12); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__12 = PyTuple_Pack(2, __pyx_slice__10, __pyx_slice__11); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__12);
  __Pyx_GIVEREF(__pyx_tuple__12);
  __pyx_slice__13 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__13)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__13);
  __Pyx_GIVEREF(__pyx_slice__13);
  __pyx_slice__14 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__14)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__14);
  __Pyx_GIVEREF(__pyx_slice__14);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__15); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = PyNumber_Subtract(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_laplacian = __pyx_t_2;
  __pyx_t_2 = 0;
  __pyx_tuple__15 = PyTuple_Pack(2, __pyx_slice__13, __pyx_slice__14); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__15);
  __Pyx_GIVEREF(__pyx_tuple__15);
+8:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_1 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_1)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_1);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_1) {
    __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    {
      __pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;
      __Pyx_INCREF(__pyx_v_laplacian);
      __Pyx_GIVEREF(__pyx_v_laplacian);
      PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_laplacian);
      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+9:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [26]:
%timeit laplace_cython(image)
3.93 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The timing didn't improve. It turns, if you click on the yellow lines, that Cython is not able to optimize the broadcasting operation (and its fancy bracketing) that we usually perform with NumPy. To allow Cython to optimize, let's write an explicit loop over the pixels in the image.

In [27]:
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    cdef int h = image.shape[0]
    cdef int w = image.shape[1]
    laplacian = np.empty((w-2, h-2))
    cdef int i, j
    for i in range(1, h-1):
        for j in range(1, w-1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
Out[27]:
Cython: _cython_magic_b3659dcdcd90d4baa8cee0acb044bd45.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: cimport numpy as cnp
 03: 
+04: def laplace_cython(cnp.ndarray image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  int __pyx_v_h;
  int __pyx_v_w;
  PyObject *__pyx_v_laplacian = NULL;
  int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_v_thresh = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_b3659dcdcd90d4baa8cee0acb044bd45_1laplace_cython, NULL, __pyx_n_s_cython_magic_b3659dcdcd90d4baa8); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 05:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 06:     Cython implementation."""
+07:     cdef int h = image.shape[0]
  __pyx_v_h = (__pyx_v_image->dimensions[0]);
+08:     cdef int w = image.shape[1]
  __pyx_v_w = (__pyx_v_image->dimensions[1]);
+09:     laplacian = np.empty((w-2, h-2))
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_empty); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_4 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_4);
  __pyx_t_2 = 0;
  __pyx_t_4 = 0;
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_4) {
    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_GOTREF(__pyx_t_1);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};
      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};
      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    } else
    #endif
    {
      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_2, 0+1, __pyx_t_5);
      __pyx_t_5 = 0;
      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_v_laplacian = __pyx_t_1;
  __pyx_t_1 = 0;
 10:     cdef int i, j
+11:     for i in range(1, h-1):
  __pyx_t_6 = (__pyx_v_h - 1);
  for (__pyx_t_7 = 1; __pyx_t_7 < __pyx_t_6; __pyx_t_7+=1) {
    __pyx_v_i = __pyx_t_7;
+12:         for j in range(1, w-1):
    __pyx_t_8 = (__pyx_v_w - 1);
    for (__pyx_t_9 = 1; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {
      __pyx_v_j = __pyx_t_9;
+13:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_i - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_3);
      __pyx_t_1 = 0;
      __pyx_t_3 = 0;
      __pyx_t_3 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __pyx_t_2 = __Pyx_PyInt_From_long((__pyx_v_i + 1)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_2);
      PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
      __pyx_t_2 = 0;
      __pyx_t_1 = 0;
      __pyx_t_1 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __pyx_t_5 = PyNumber_Add(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_j - 1)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_3);
      __pyx_t_1 = 0;
      __pyx_t_3 = 0;
      __pyx_t_3 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __pyx_t_2 = PyNumber_Add(__pyx_t_5, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_5 = __Pyx_PyInt_From_long((__pyx_v_j + 1)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_t_5);
      __pyx_t_3 = 0;
      __pyx_t_5 = 0;
      __pyx_t_5 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __pyx_t_1 = PyNumber_Add(__pyx_t_2, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __pyx_t_5 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_2);
      PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_2);
      __pyx_t_5 = 0;
      __pyx_t_2 = 0;
      __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __pyx_t_2 = PyNumber_Subtract(__pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_i - 1)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_j - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
      __pyx_t_3 = 0;
      __pyx_t_1 = 0;
      if (unlikely(PyObject_SetItem(__pyx_v_laplacian, __pyx_t_5, __pyx_t_2) < 0)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
+14:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_5 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_t_5, __pyx_n_s_abs); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_t_5 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_1))) {
    __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_1);
    if (likely(__pyx_t_5)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_1);
      __Pyx_INCREF(__pyx_t_5);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_1, function);
    }
  }
  if (!__pyx_t_5) {
    __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_1, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_1)) {
      PyObject *__pyx_temp[2] = {__pyx_t_5, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_1, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_1)) {
      PyObject *__pyx_temp[2] = {__pyx_t_5, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_1, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    {
      __pyx_t_3 = PyTuple_New(1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5); __pyx_t_5 = NULL;
      __Pyx_INCREF(__pyx_v_laplacian);
      __Pyx_GIVEREF(__pyx_v_laplacian);
      PyTuple_SET_ITEM(__pyx_t_3, 0+1, __pyx_v_laplacian);
      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_t_3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_thresh = __pyx_t_1;
  __pyx_t_1 = 0;
+15:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [28]:
%timeit laplace_cython(image)
333 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Woops! The function we just wrote is 100x times slower than our NumPy implementation! The reason can be traced to the core of our function: the sum of the five pixel bound terms should be free from any Python object conversions. However, that line is all yellow and if you click it, you'll see it's full of Python object conversions.

How do we solve this problem? The answer is: we need to pass the array as a buffer that can be accessed at C speeds. Let's see:

In [29]:
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray[double, ndim=2] image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    cdef int h = image.shape[0]
    cdef int w = image.shape[1]
    cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
    cdef int i, j
    for i in range(1, h-1):
        for j in range(1, w-1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
Out[29]:
Cython: _cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: cimport numpy as cnp
 03: 
+04: def laplace_cython(cnp.ndarray[double, ndim=2] image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  int __pyx_v_h;
  int __pyx_v_w;
  PyArrayObject *__pyx_v_laplacian = 0;
  int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_v_thresh = NULL;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_image;
  __Pyx_Buffer __pyx_pybuffer_image;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_laplacian;
  __Pyx_Buffer __pyx_pybuffer_laplacian;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
  __pyx_pybuffer_laplacian.pybuffer.buf = NULL;
  __pyx_pybuffer_laplacian.refcount = 0;
  __pyx_pybuffernd_laplacian.data = NULL;
  __pyx_pybuffernd_laplacian.rcbuffer = &__pyx_pybuffer_laplacian;
  __pyx_pybuffer_image.pybuffer.buf = NULL;
  __pyx_pybuffer_image.refcount = 0;
  __pyx_pybuffernd_image.data = NULL;
  __pyx_pybuffernd_image.rcbuffer = &__pyx_pybuffer_image;
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_image.rcbuffer->pybuffer, (PyObject*)__pyx_v_image, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) __PYX_ERR(0, 4, __pyx_L1_error)
  }
  __pyx_pybuffernd_image.diminfo[0].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_image.diminfo[0].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_image.diminfo[1].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_image.diminfo[1].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[1];
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  { PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
  __Pyx_AddTraceback("_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  goto __pyx_L2;
  __pyx_L0:;
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __pyx_L2:;
  __Pyx_XDECREF((PyObject *)__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_4fe2ce48caa4aef98bfa5ed5cc967bf7_1laplace_cython, NULL, __pyx_n_s_cython_magic_4fe2ce48caa4aef98b); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 05:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 06:     Cython implementation."""
+07:     cdef int h = image.shape[0]
  __pyx_v_h = (__pyx_v_image->dimensions[0]);
+08:     cdef int w = image.shape[1]
  __pyx_v_w = (__pyx_v_image->dimensions[1]);
+09:     cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_3);
  PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3);
  __pyx_t_1 = 0;
  __pyx_t_3 = 0;
  __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_4);
  __pyx_t_4 = 0;
  __pyx_t_4 = PyDict_New(); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_double); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_6 = ((PyArrayObject *)__pyx_t_5);
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {
      __pyx_v_laplacian = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf = NULL;
      __PYX_ERR(0, 9, __pyx_L1_error)
    } else {__pyx_pybuffernd_laplacian.diminfo[0].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_laplacian.diminfo[0].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_laplacian.diminfo[1].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_laplacian.diminfo[1].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[1];
    }
  }
  __pyx_t_6 = 0;
  __pyx_v_laplacian = ((PyArrayObject *)__pyx_t_5);
  __pyx_t_5 = 0;
 10:     cdef int i, j
+11:     for i in range(1, h-1):
  __pyx_t_7 = (__pyx_v_h - 1);
  for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
    __pyx_v_i = __pyx_t_8;
+12:         for j in range(1, w-1):
    __pyx_t_9 = (__pyx_v_w - 1);
    for (__pyx_t_10 = 1; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
      __pyx_v_j = __pyx_t_10;
+13:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_11 = (__pyx_v_i - 1);
      __pyx_t_12 = __pyx_v_j;
      __pyx_t_13 = -1;
      if (__pyx_t_11 < 0) {
        __pyx_t_11 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_11 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_11 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_12 < 0) {
        __pyx_t_12 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_12 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_12 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_14 = (__pyx_v_i + 1);
      __pyx_t_15 = __pyx_v_j;
      __pyx_t_13 = -1;
      if (__pyx_t_14 < 0) {
        __pyx_t_14 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_14 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_14 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_15 < 0) {
        __pyx_t_15 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_15 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_15 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_16 = __pyx_v_i;
      __pyx_t_17 = (__pyx_v_j - 1);
      __pyx_t_13 = -1;
      if (__pyx_t_16 < 0) {
        __pyx_t_16 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_16 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_16 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_17 < 0) {
        __pyx_t_17 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_17 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_17 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_18 = __pyx_v_i;
      __pyx_t_19 = (__pyx_v_j + 1);
      __pyx_t_13 = -1;
      if (__pyx_t_18 < 0) {
        __pyx_t_18 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_18 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_18 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_19 < 0) {
        __pyx_t_19 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_19 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_19 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_20 = __pyx_v_i;
      __pyx_t_21 = __pyx_v_j;
      __pyx_t_13 = -1;
      if (__pyx_t_20 < 0) {
        __pyx_t_20 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_20 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_20 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_21 < 0) {
        __pyx_t_21 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_21 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_21 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_22 = (__pyx_v_i - 1);
      __pyx_t_23 = (__pyx_v_j - 1);
      __pyx_t_13 = -1;
      if (__pyx_t_22 < 0) {
        __pyx_t_22 += __pyx_pybuffernd_laplacian.diminfo[0].shape;
        if (unlikely(__pyx_t_22 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_22 >= __pyx_pybuffernd_laplacian.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_23 < 0) {
        __pyx_t_23 += __pyx_pybuffernd_laplacian.diminfo[1].shape;
        if (unlikely(__pyx_t_23 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_23 >= __pyx_pybuffernd_laplacian.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      *__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf, __pyx_t_22, __pyx_pybuffernd_laplacian.diminfo[0].strides, __pyx_t_23, __pyx_pybuffernd_laplacian.diminfo[1].strides) = (((((*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_12, __pyx_pybuffernd_image.diminfo[1].strides)) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_15, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_16, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_17, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_18, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_19, __pyx_pybuffernd_image.diminfo[1].strides))) - (4.0 * (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_20, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_21, __pyx_pybuffernd_image.diminfo[1].strides))));
    }
  }
+14:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_4) {
    __pyx_t_5 = __Pyx_PyObject_CallOneArg(__pyx_t_3, ((PyObject *)__pyx_v_laplacian)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    {
      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
      __Pyx_INCREF(((PyObject *)__pyx_v_laplacian));
      __Pyx_GIVEREF(((PyObject *)__pyx_v_laplacian));
      PyTuple_SET_ITEM(__pyx_t_2, 0+1, ((PyObject *)__pyx_v_laplacian));
      __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+15:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [30]:
%timeit laplace_cython(image)
2.57 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Finally! We have made our function faster by a factor of 2.

But can we go further? Yes, by observing that on the summation line, Cython still performs some bounds checks.

In [31]:
%%cython -a
import numpy as np
cimport numpy as cnp
import cython

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def laplace_cython(cnp.ndarray[double, ndim=2] image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    cdef int h = image.shape[0]
    cdef int w = image.shape[1]
    cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
    cdef int i, j
    for i in range(1, h-1):
        for j in range(1, w-1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
Out[31]:
Cython: _cython_magic_8b773ac434435b285c70274cc23d7117.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: cimport numpy as cnp
 03: import cython
 04: 
 05: @cython.boundscheck(False) # turn off bounds-checking for entire function
 06: @cython.wraparound(False)  # turn off negative index wrapping for entire function
+07: def laplace_cython(cnp.ndarray[double, ndim=2] image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_8b773ac434435b285c70274cc23d7117_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_8b773ac434435b285c70274cc23d7117_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_8b773ac434435b285c70274cc23d7117_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_8b773ac434435b285c70274cc23d7117_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_8b773ac434435b285c70274cc23d7117_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_8b773ac434435b285c70274cc23d7117_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_8b773ac434435b285c70274cc23d7117_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_8b773ac434435b285c70274cc23d7117_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  int __pyx_v_h;
  int __pyx_v_w;
  PyArrayObject *__pyx_v_laplacian = 0;
  int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_v_thresh = NULL;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_image;
  __Pyx_Buffer __pyx_pybuffer_image;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_laplacian;
  __Pyx_Buffer __pyx_pybuffer_laplacian;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
  __pyx_pybuffer_laplacian.pybuffer.buf = NULL;
  __pyx_pybuffer_laplacian.refcount = 0;
  __pyx_pybuffernd_laplacian.data = NULL;
  __pyx_pybuffernd_laplacian.rcbuffer = &__pyx_pybuffer_laplacian;
  __pyx_pybuffer_image.pybuffer.buf = NULL;
  __pyx_pybuffer_image.refcount = 0;
  __pyx_pybuffernd_image.data = NULL;
  __pyx_pybuffernd_image.rcbuffer = &__pyx_pybuffer_image;
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_image.rcbuffer->pybuffer, (PyObject*)__pyx_v_image, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) __PYX_ERR(0, 7, __pyx_L1_error)
  }
  __pyx_pybuffernd_image.diminfo[0].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_image.diminfo[0].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_image.diminfo[1].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_image.diminfo[1].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[1];
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  { PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
  __Pyx_AddTraceback("_cython_magic_8b773ac434435b285c70274cc23d7117.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  goto __pyx_L2;
  __pyx_L0:;
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __pyx_L2:;
  __Pyx_XDECREF((PyObject *)__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_8b773ac434435b285c70274cc23d7117_1laplace_cython, NULL, __pyx_n_s_cython_magic_8b773ac434435b285c); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 08:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 09:     Cython implementation."""
+10:     cdef int h = image.shape[0]
  __pyx_v_h = (__pyx_v_image->dimensions[0]);
+11:     cdef int w = image.shape[1]
  __pyx_v_w = (__pyx_v_image->dimensions[1]);
+12:     cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_3);
  PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3);
  __pyx_t_1 = 0;
  __pyx_t_3 = 0;
  __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_4);
  __pyx_t_4 = 0;
  __pyx_t_4 = PyDict_New(); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_double); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 12, __pyx_L1_error)
  __pyx_t_6 = ((PyArrayObject *)__pyx_t_5);
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {
      __pyx_v_laplacian = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf = NULL;
      __PYX_ERR(0, 12, __pyx_L1_error)
    } else {__pyx_pybuffernd_laplacian.diminfo[0].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_laplacian.diminfo[0].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_laplacian.diminfo[1].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_laplacian.diminfo[1].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[1];
    }
  }
  __pyx_t_6 = 0;
  __pyx_v_laplacian = ((PyArrayObject *)__pyx_t_5);
  __pyx_t_5 = 0;
 13:     cdef int i, j
+14:     for i in range(1, h-1):
  __pyx_t_7 = (__pyx_v_h - 1);
  for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
    __pyx_v_i = __pyx_t_8;
+15:         for j in range(1, w-1):
    __pyx_t_9 = (__pyx_v_w - 1);
    for (__pyx_t_10 = 1; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
      __pyx_v_j = __pyx_t_10;
+16:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_11 = (__pyx_v_i - 1);
      __pyx_t_12 = __pyx_v_j;
      __pyx_t_13 = (__pyx_v_i + 1);
      __pyx_t_14 = __pyx_v_j;
      __pyx_t_15 = __pyx_v_i;
      __pyx_t_16 = (__pyx_v_j - 1);
      __pyx_t_17 = __pyx_v_i;
      __pyx_t_18 = (__pyx_v_j + 1);
      __pyx_t_19 = __pyx_v_i;
      __pyx_t_20 = __pyx_v_j;
      __pyx_t_21 = (__pyx_v_i - 1);
      __pyx_t_22 = (__pyx_v_j - 1);
      *__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf, __pyx_t_21, __pyx_pybuffernd_laplacian.diminfo[0].strides, __pyx_t_22, __pyx_pybuffernd_laplacian.diminfo[1].strides) = (((((*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_12, __pyx_pybuffernd_image.diminfo[1].strides)) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_14, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_15, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_16, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_17, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_18, __pyx_pybuffernd_image.diminfo[1].strides))) - (4.0 * (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_19, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_20, __pyx_pybuffernd_image.diminfo[1].strides))));
    }
  }
+17:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_4) {
    __pyx_t_5 = __Pyx_PyObject_CallOneArg(__pyx_t_3, ((PyObject *)__pyx_v_laplacian)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    {
      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
      __Pyx_INCREF(((PyObject *)__pyx_v_laplacian));
      __Pyx_GIVEREF(((PyObject *)__pyx_v_laplacian));
      PyTuple_SET_ITEM(__pyx_t_2, 0+1, ((PyObject *)__pyx_v_laplacian));
      __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+18:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [32]:
%timeit laplace_cython(image)
1.61 ms ± 96.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [33]:
np.allclose(laplace_numpy(image), laplace_cython(image))
Out[33]:
True

What we can notice in the above function is that we have gotten rid completely of interaction in the for loop, which leads to a 4x increase in performance over our NumPy implementation.

Let's now turn to pythran.

Pythran

One of the advantages of pythran is that it can optimize higher level numpy code than Cython. Luckily for us again, it features a line magic, so we keep working in the notebook.

In [34]:
%load_ext pythran.magic
WARNING: Disabling color, you really want to install colorlog.
WARNING: Pythran support disabled for module: omp

To use pythran, all you have to do is to annotate the function you want to export and give it a signature. This is done in a comment line starting the pythran file. One of the promises of pythran is that it can often handle high level broadcasting with NumPy and still optimize our function.

Annotating our NumPy function, we obtain this:

In [35]:
%%pythran
#pythran export laplace_pythran(float[][])
import numpy as np
def laplace_pythran(image):
    """Laplace operator in NumPy for 2D images. Pythran accelerated."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In [36]:
np.allclose(laplace_numpy(image), laplace_pythran(image))
Out[36]:
True
In [37]:
%timeit laplace_pythran(image)
40.3 ms ± 9.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In this case, it's disappointing. The pythran translated function is 10x times slower than pure NumPy.

But pythran can also translate "low level style" code, just like Cython. Let's try again.

In [38]:
%%pythran 
#pythran export laplace_pythran(float[][])
import numpy as np
def laplace_pythran(image):
    """Laplace operator in NumPy for 2D images. Pythran accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    laplacian = np.empty((h - 2, w - 2))
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In [39]:
np.allclose(laplace_numpy(image), laplace_pythran(image))
Out[39]:
True
In [40]:
%timeit laplace_pythran(image)
15.6 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Mmmh, still not very good. What if we move the thresholding operation inside the main loop?

In [41]:
%%pythran 
#pythran export laplace_pythran(float[][])
import numpy as np
def laplace_pythran(image):
    """Laplace operator in NumPy for 2D images. Pythran accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    laplacian = np.empty((h - 2, w - 2))
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = np.abs(image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]) > 0.05
    return laplacian
In [42]:
np.allclose(laplace_numpy(image), laplace_pythran(image))
Out[42]:
True
In [43]:
%timeit laplace_pythran(image)
1.29 ms ± 85.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So in this case, pythran with a low level approach outperforms Cython (barely).

Another interesting point here, concerning debugging pythran compilation, is that pythran annotated code is still perfectly valid Python. This helps during the debugging phase, compared to the previously explained Cython approach.

Let's turn to our last tool, numba.

Numba

Numba is very similar to pythran, except that it does just-in-time compilation (JIT). We have several options to for use. Let's try the standard jit approach with a decorator.

In [44]:
from numba import jit
In [45]:
@jit
def laplace_numba(image):
    """Laplace operator in NumPy for 2D images. Accelerated using numba."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh

We call it once to avoid timing issues due to JIT compilation.

In [46]:
laplace_numba(image);
In [47]:
%timeit laplace_numba(image)
3.06 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is rightaway faster than NumPy.

We can also tell numba to avoid using Python at all in its computation. This is called with @jit(nopython=True).

In [48]:
@jit(nopython=True)
def laplace_numba(image):
    """Laplace operator in NumPy for 2D images. Accelerated using numba."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In [49]:
laplace_numba(image);
In [50]:
%timeit laplace_numba(image)
2.93 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

No big effect. In fact, we can infer from this that numba managed to generate pure C code from our function and that it did it already previously.

To confirm that, we can either use the inspect_types() method of our jitted function or use the annotation tool provided with numba.

In [51]:
laplace_numba.inspect_types()
laplace_numba (array(float64, 2d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-48-eaa032c0e5bc>
# --- LINE 1 --- 
# label 0
#   del $const0.3
#   del $const0.2
#   del $0.4
#   del $const0.7
#   del $const0.6
#   del $0.8
#   del $0.9
#   del $0.5
#   del $0.10
#   del $const0.14
#   del $const0.13
#   del $0.15
#   del $const0.18
#   del $const0.17
#   del $0.19
#   del $0.20
#   del $0.16
#   del $0.21
#   del $const0.26
#   del $const0.25
#   del $0.27
#   del $const0.30
#   del $const0.29
#   del $0.31
#   del $0.32
#   del $0.28
#   del $0.33
#   del $const0.38
#   del $const0.37
#   del $0.39
#   del $const0.42
#   del $const0.41
#   del $0.43
#   del $0.44
#   del $0.40
#   del $0.45
#   del $const0.51
#   del $const0.50
#   del $0.52
#   del $const0.55
#   del $const0.54
#   del $0.56
#   del $0.57
#   del $0.53
#   del image
#   del $0.58
#   del $const0.48
#   del $0.11
#   del $0.22
#   del $0.34
#   del $0.46
#   del $0.59
#   del $0.61
#   del $0.62
#   del $0.63
#   del laplacian
#   del $const0.66
#   del $0.67
#   del thresh

@jit(nopython=True)

# --- LINE 2 --- 

def laplace_numba(image):

    # --- LINE 3 --- 

    """Laplace operator in NumPy for 2D images. Accelerated using numba."""

    # --- LINE 4 --- 
    #   image = arg(0, name=image)  :: array(float64, 2d, C)
    #   $const0.2 = const(NoneType, None)  :: none
    #   $const0.3 = const(int, -2)  :: int64
    #   $0.4 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.5 = call $0.4($const0.2, $const0.3)  :: (none, int64) -> slice<a:b>
    #   $const0.6 = const(int, 1)  :: int64
    #   $const0.7 = const(int, -1)  :: int64
    #   $0.8 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.9 = call $0.8($const0.6, $const0.7)  :: (int64, int64) -> slice<a:b>
    #   $0.10 = build_tuple(items=[Var($0.5, <ipython-input-48-eaa032c0e5bc> (4)), Var($0.9, <ipython-input-48-eaa032c0e5bc> (4))])  :: (slice<a:b> x 2)
    #   $0.11 = static_getitem(value=image, index=(slice(None, -2, None), slice(1, -1, None)), index_var=$0.10)  :: array(float64, 2d, A)
    #   $const0.13 = const(int, 2)  :: int64
    #   $const0.14 = const(NoneType, None)  :: none
    #   $0.15 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.16 = call $0.15($const0.13, $const0.14)  :: (int64, none) -> slice<a:b>
    #   $const0.17 = const(int, 1)  :: int64
    #   $const0.18 = const(int, -1)  :: int64
    #   $0.19 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.20 = call $0.19($const0.17, $const0.18)  :: (int64, int64) -> slice<a:b>
    #   $0.21 = build_tuple(items=[Var($0.16, <ipython-input-48-eaa032c0e5bc> (4)), Var($0.20, <ipython-input-48-eaa032c0e5bc> (4))])  :: (slice<a:b> x 2)
    #   $0.22 = static_getitem(value=image, index=(slice(2, None, None), slice(1, -1, None)), index_var=$0.21)  :: array(float64, 2d, A)
    #   $const0.25 = const(int, 1)  :: int64
    #   $const0.26 = const(int, -1)  :: int64
    #   $0.27 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.28 = call $0.27($const0.25, $const0.26)  :: (int64, int64) -> slice<a:b>
    #   $const0.29 = const(NoneType, None)  :: none
    #   $const0.30 = const(int, -2)  :: int64
    #   $0.31 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.32 = call $0.31($const0.29, $const0.30)  :: (none, int64) -> slice<a:b>
    #   $0.33 = build_tuple(items=[Var($0.28, <ipython-input-48-eaa032c0e5bc> (4)), Var($0.32, <ipython-input-48-eaa032c0e5bc> (4))])  :: (slice<a:b> x 2)
    #   $0.34 = static_getitem(value=image, index=(slice(1, -1, None), slice(None, -2, None)), index_var=$0.33)  :: array(float64, 2d, A)
    #   $const0.37 = const(int, 1)  :: int64
    #   $const0.38 = const(int, -1)  :: int64
    #   $0.39 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.40 = call $0.39($const0.37, $const0.38)  :: (int64, int64) -> slice<a:b>
    #   $const0.41 = const(int, 2)  :: int64
    #   $const0.42 = const(NoneType, None)  :: none
    #   $0.43 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.44 = call $0.43($const0.41, $const0.42)  :: (int64, none) -> slice<a:b>
    #   $0.45 = build_tuple(items=[Var($0.40, <ipython-input-48-eaa032c0e5bc> (4)), Var($0.44, <ipython-input-48-eaa032c0e5bc> (4))])  :: (slice<a:b> x 2)
    #   $0.46 = static_getitem(value=image, index=(slice(1, -1, None), slice(2, None, None)), index_var=$0.45)  :: array(float64, 2d, A)
    #   $const0.48 = const(int, 4)  :: int64
    #   $const0.50 = const(int, 1)  :: int64
    #   $const0.51 = const(int, -1)  :: int64
    #   $0.52 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.53 = call $0.52($const0.50, $const0.51)  :: (int64, int64) -> slice<a:b>
    #   $const0.54 = const(int, 1)  :: int64
    #   $const0.55 = const(int, -1)  :: int64
    #   $0.56 = global(slice: <class 'slice'>)  :: Function(<class 'slice'>)
    #   $0.57 = call $0.56($const0.54, $const0.55)  :: (int64, int64) -> slice<a:b>
    #   $0.58 = build_tuple(items=[Var($0.53, <ipython-input-48-eaa032c0e5bc> (4)), Var($0.57, <ipython-input-48-eaa032c0e5bc> (4))])  :: (slice<a:b> x 2)
    #   $0.59 = static_getitem(value=image, index=(slice(1, -1, None), slice(1, -1, None)), index_var=$0.58)  :: array(float64, 2d, A)
    #   $0.61 = arrayexpr(expr=('-', [('+', [('+', [('+', [Var($0.11, <ipython-input-48-eaa032c0e5bc> (4)), Var($0.22, <ipython-input-48-eaa032c0e5bc> (4))]), Var($0.34, <ipython-input-48-eaa032c0e5bc> (4))]), Var($0.46, <ipython-input-48-eaa032c0e5bc> (4))]), ('*', [const(int, 4), Var($0.59, <ipython-input-48-eaa032c0e5bc> (4))])]), ty=array(float64, 2d, C))  :: array(float64, 2d, C)
    #   laplacian = $0.61  :: array(float64, 2d, C)

    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]

    # --- LINE 5 --- 
    #   $0.62 = global(np: <module 'numpy' from '/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/__init__.py'>)  :: Module(<module 'numpy' from '/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/__init__.py'>)
    #   $0.63 = getattr(value=$0.62, attr=abs)  :: Function(<ufunc 'absolute'>)
    #   $const0.66 = const(float, 0.05)  :: float64
    #   $0.67 = arrayexpr(expr=('>', [(<ufunc 'absolute'>, [Var(laplacian, <ipython-input-48-eaa032c0e5bc> (4))]), const(float, 0.05)]), ty=array(bool, 2d, C))  :: array(bool, 2d, C)
    #   thresh = $0.67  :: array(bool, 2d, C)

    thresh = np.abs(laplacian) > 0.05

    # --- LINE 6 --- 
    #   $0.69 = cast(value=thresh)  :: array(bool, 2d, C)
    #   return $0.69

    return thresh


================================================================================

Admittedly, not very helpful. Let's try the annotation tool.

In [52]:
%%file numba_annotation_demo.py
import numpy as np
import skimage.data
import skimage.color
from numba import jit

@jit(nopython=True)
def laplace_numba(image):
    """Laplace operator in NumPy for 2D images. Accelerated using numba."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh

image = skimage.data.astronaut()
image = skimage.color.rgb2gray(image)
laplace_numba(image)
Overwriting numba_annotation_demo.py

We have to run the annotation tool from the command line:

In [53]:
!numba --annotate-html annotation.html numba_annotation_demo.py
In [54]:
from IPython.display import HTML
In [55]:
HTML(data=open('annotation.html').read())
Out[55]:
show numba IR
6: @jit(nopython=True)
7: def laplace_numba(image):
8:     """Laplace operator in NumPy for 2D images. Accelerated using numba."""
9:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
10:     thresh = np.abs(laplacian) > 0.05
11:     return thresh



show numba IR
1: import numpy as np
2: import skimage.data
3: import skimage.color
4: from numba import jit
5:
6: @jit(nopython=True)
7: def laplace_numba(image):
8:     """Laplace operator in NumPy for 2D images. Accelerated using numba."""
9:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
10:     thresh = np.abs(laplacian) > 0.05
11:     return thresh



show numba IR
174: def _broadcast_onto(src_ndim, src_shape, dest_ndim, dest_shape):
175:     '''Low-level utility function used in calculating a shape for
176:     an implicit output array. This function assumes that the
177:     destination shape is an LLVM pointer to a C-style array that was
178:     already initialized to a size of one along all axes.
179:
180:     Returns an integer value:
181:     >= 1 : Succeeded. Return value should equal the number of dimensions in
182:              the destination shape.
183:     0 : Failed to broadcast because source shape is larger than the
184:              destination shape (this case should be weeded out at type
185:              checking).
186:     < 0 : Failed to broadcast onto destination axis, at axis number ==
187:              -(return_value + 1).
188:     '''
189:     if src_ndim > dest_ndim:
190:         # This check should have been done during type checking, but
191:         # let's be defensive anyway...
192:         return 0
193:     else:
194:         src_index = 0
195:         dest_index = dest_ndim - src_ndim
196:         while src_index < src_ndim:
197:             src_dim_size = src_shape[src_index]
198:             dest_dim_size = dest_shape[dest_index]
199:             # Check to see if we've already mutated the destination
200:             # shape along this axis.
201:             if dest_dim_size != 1:
202:                 # If we have mutated the destination shape already,
203:                 # then the source axis size must either be one,
204:                 # or the destination axis size.
205:                 if src_dim_size != dest_dim_size and src_dim_size != 1:
206:                     return -(dest_index + 1)
207:             elif src_dim_size != 1:
208:                 # If the destination size is still its initial
209:                 dest_shape[dest_index] = src_dim_size
210:             src_index += 1
211:             dest_index += 1
212:     return dest_index



show numba IR
1: import numpy as np
2: import skimage.data
3: import skimage.color
4: from numba import jit
5:
6: @jit(nopython=True)
7: def laplace_numba(image):
8:     """Laplace operator in NumPy for 2D images. Accelerated using numba."""
9:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
10:     thresh = np.abs(laplacian) > 0.05
11:     return thresh



What this shows is that numba managed to correctly optimize the broadcasting (if it hadn't we would have seen colored source code lines).

Let's see also investigate how numba optimizes the low-level approach with explicit for loops.

In [56]:
@jit(nopython=True)
def laplace_numba(image):
    """Laplace operator for 2D images. Numba accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    laplacian = np.empty((h - 2, w - 2))
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = np.abs(image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]) > 0.05
    return laplacian
In [57]:
laplace_numba(image);
In [58]:
%timeit laplace_numba(image)
924 µs ± 112 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This approach improves the performance!

Another way of approaching optimization with numba is to thing in terms of NumPy ufuncs. You can write generalized ufuncs with numba using another decorator, @guvectorize. However, this comes at the cost that we have to think in terms of an ufunc and thus we have to provide an array that gets filled by our computation routine.

In [59]:
from numba import guvectorize
In [60]:
@guvectorize('void(float64[:, :], float64[:, :])', "(m, n)->(m, n)")
def laplace_numba_guvectorize(image, laplacian):
    """Laplace operator in NumPy for 2D images. Numba accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = np.abs(4 * image[i, j] - image[i - 1, j] - image[i + 1, j] - image[i, j + 1] - image[i, j - 1]) > 0.05
In [61]:
laplacian = np.empty_like(image)
In [62]:
laplace_numba_guvectorize(image, laplacian);
In [63]:
np.allclose(laplace_numpy(image), laplacian[:-2, :-2])
Out[63]:
True
In [64]:
%timeit laplace_numba_guvectorize(image, laplacian);
609 µs ± 156 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This approach is even faster. However, it has some limitations: the signature for array sizes forces us to use the same shapes in the input and output. I could'nt use the "(m, n)->(m-1, n-1)" signature. Note that we also could have target parallel execution by using the target=parallel option in the decorator.

Wrap-up and plots

Time for a wrap-up and some plots.

In [65]:
timings = {}
for func in [laplace_skimage, laplace_numpy, laplace_cython, laplace_pythran, laplace_numba]:
    t = %timeit -o func(image)
    timings[func.__name__] = t
4.61 ms ± 607 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.38 ms ± 899 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.22 ms ± 431 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.25 ms ± 152 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
775 µs ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [66]:
t = %timeit -o laplace_numba_guvectorize(image, laplacian);
timings['laplace_numba_guvectorize'] = t
508 µs ± 5.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Here are the table and the plots.

In [67]:
import pandas as pd
In [68]:
pd.Series({key: timings[key].average * 1e6 for key in timings}).to_frame(name='timings (μs)').sort_values(by='timings (μs)')
Out[68]:
timings (μs)
laplace_numba_guvectorize 507.651230
laplace_numba 774.716209
laplace_pythran 1246.428633
laplace_cython 2219.494851
laplace_skimage 4611.953934
laplace_numpy 5383.624663
In [69]:
fig, ax = plt.subplots(figsize=(10, 10))
pd.Series({key: timings[key].average * 1e6 for key in timings}).to_frame(name='timings (μs)').sort_values(by='timings (μs)').plot(kind='barh', ax=ax)
Out[69]:
<matplotlib.axes._subplots.AxesSubplot at 0x11df7a4a8>

conclusions

So, what have I learned during these last three days? Here are some thoughts:

  • most approaches suggest you rewrite your function in a low-level style if you want a good optimization result
  • most approaches rely on the type annotations which you supply and which are crucial for good results (and sometimes it's easy to forget one variable which in turn prevents the optimizer from doing its best)
  • pythran, like numba, can in theory optimize high-level NumPy code (in our example, numba did better than pythran)
  • numba yields the best performance in our benchmark
  • Cython, contrary to what I expected, comes in after those two (but still yields a good speed-up)
  • debugging all code generators is usually frustrating, due to difficult to understand error messages, low-level stuff (try reading the Cython output by clicking the yellow lines)
  • numba also allows targeting GPUs, something I haven't spoken about here since I don't have a GPU on my MacBook (this was the topic I had the most difficulty with during the training since I've never worked with GPUs)

further reading

This post was entirely written using the IPython notebook. Its content is BSD-licensed. You can see a static view or download this notebook with the help of nbviewer at 20170706_OptimizingYourPythonCode.ipynb.

Comments