Optimizing your code with NumPy, Cython, pythran and numba

I had the pleasure of attending a workshop given by the groupe calcul (CNRS) this week. The topic was: how do you optimize the execution speed of your Python code, under the hypothesis that you already tried to make it fast using NumPy?

The training was held over three days and presented three interesting ways to achieve speedups: Cython, pythran and numba.

Overall, the workshop was great. The goal of this blog post is to summarize some of the key insights that I learnt while using these three tools on an practical application: image filtering.

This post uses the following versions of the libraries:

In [1]:
import cython; print(cython.__version__)
import pythran; print(pythran.__version__)
import numba; print(numba.__version__)
WARNING: Disabling color, you really want to install colorlog.
0.27.3
WARNING: Pythran support disabled for module: omp
0.8.4post0
0.36.2

Introducing the discrete Laplacian

When working with images, the discrete Laplacian operator is often used for edge detection.

Intuitively, if you want to find the edges of an image, you compute the Laplacian and threshold it to see the edges appear. Let's give an example using scikit-image.

First, let's load a standard grayscale image that ships with scikit-image, the astronaut.

In [2]:
import skimage.data
import skimage.color
In [3]:
image = skimage.data.astronaut()
image = skimage.color.rgb2gray(image)
image.shape
Out[3]:
(512, 512)

Now, let's write a function that uses the scikit-image implementation of the discrete Laplace operator and thresholds the result with a fixed threshold.

In [4]:
from skimage.filters import laplace
import numpy as np
In [5]:
def laplace_skimage(image):
    """Applies Laplace operator to 2D image using skimage implementation. 
    Then tresholds the result and returns boolean image."""
    laplacian = laplace(image)
    thresh = np.abs(laplacian) > 0.05
    return thresh

Let's apply the function and check its result.

In [6]:
edges = laplace_skimage(image)
In [7]:
edges.shape
Out[7]:
(512, 512)

Let's display the obtained edges, along with the original image.

In [8]:
import matplotlib.pyplot as plt
%matplotlib inline
In [9]:
def compare(left, right):
    """Compares two images, left and right."""
    fig, ax = plt.subplots(1, 2, figsize=(10, 5))
    ax[0].imshow(left, cmap='gray')
    ax[1].imshow(right, cmap='gray')
In [10]:
compare(left=image, right=edges)

Since the topic of this post is to deal with optimization, we will now suppose that we're not using scikit-image.

As is usually the case when you're working on your own problems, we'll implement the Laplacian algorithm by ourselves using NumPy.

NumPy implementation

So here we go: we write a function that makes use of vectorized NumPy to perform the same operation than the scikit-image implementation.

In [11]:
def laplace_numpy(image):
    """Applies Laplace operator to 2D image using our own NumPy implementation. 
    Then tresholds the result and returns boolean image."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh

If you're paying attention closely, we're not actually implementing exactly the same function as scikit-image, since we're excluding all border pixels of the image from our calculation.

Let's check the result is qualitatively correct.

In [12]:
laplace_numpy(image).shape
Out[12]:
(510, 510)
In [13]:
compare(image, laplace_numpy(image))

And let's also check that our implementation quantitavely agrees with scikit-image on every interior pixel.

In [14]:
np.allclose(laplace_skimage(image)[1:-1, 1:-1], laplace_numpy(image))
Out[14]:
True

Good, we're now all set for the next step. Let's pretend that we think our implementation is slow. How do we objectively measure that? And then: how do we make it faster?

Profiling and timing

So, we think our NumPy version is slow. How can we put a number on that feeling? The first idea is to time the execution on our test image. We can easily do this with the %time line magic.

In [15]:
%time laplace_numpy(image)
CPU times: user 3.43 ms, sys: 1.32 ms, total: 4.74 ms
Wall time: 5.08 ms
Out[15]:
array([[False,  True,  True, ..., False, False, False],
       [False,  True, False, ..., False, False, False],
       [ True,  True, False, ..., False, False, False],
       ..., 
       [False, False, False, ...,  True, False, False],
       [False, False, False, ...,  True, False, False],
       [False, False, False, ...,  True,  True, False]], dtype=bool)

Mmmh, around 5 milliseconds per call. If we run this multiple times, the results slightly vary.

Therefore, it is better to use the %timeit line magic, which averages the timing operation and gives us a more accurate measurement.

In [16]:
%timeit laplace_numpy(image)
3.37 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How does the scikit-image implementation compare to that?

In [17]:
%timeit laplace_skimage(image)
3.88 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Approximately the same, albeit a little slower.

So, our function is too slow for say, our realtime application. To make it go faster, we first need to have an idea about what takes time in it. Our function works in two steps: first we compute the laplacian and then we threshold the results. Which part of the function is taking the most time?

To answer that question, we use a profiler. Many different profilers exit. We'll look at three below: %prun, %lprun and pprofile.

Using %prun gives us a high level view of what's happening inside our call.

In [18]:
r = %prun -r laplace_numpy(image)
r.print_stats()
          4 function calls in 0.004 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.004    0.004    0.004    0.004 :1(laplace_numpy)
        1    0.000    0.000    0.004    0.004 :1()
        1    0.000    0.000    0.004    0.004 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Out[18]:

This is not very informative. But we understand that most of the time (column tottime) is spent inside our laplace_numpy function.

What about %lprun?

In [19]:
%load_ext line_profiler
In [20]:
r = %lprun -r -f laplace_numpy laplace_numpy(image)
r.print_stats()
Timer unit: 1e-06 s

Total time: 0.007151 s
File: 
Function: laplace_numpy at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def laplace_numpy(image):
     2                                               """Applies Laplace operator to 2D image using our own NumPy implementation. 
     3                                               Then tresholds the result and returns boolean image."""
     4         1       6403.0   6403.0     89.5      laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
     5         1        745.0    745.0     10.4      thresh = np.abs(laplacian) > 0.05
     6         1          3.0      3.0      0.0      return thresh

Running %lprun -r -f laplace_numpy laplace_numpy(image) means that we aske %lprun to show how long each line took to execute inside the function laplace_numpy and return the result as an object.

Here, we can answer the earlier question: computing the laplacian is what takes the most time in this function.

Let's see if pprofile confirms that measurement.

In [21]:
import pprofile
def func_to_profile():
    prof = pprofile.Profile()
    with prof():
        laplace_numpy(image)
    prof.print_stats()
In [22]:
func_to_profile()
Total duration: 0.00508428s
File: 
File duration: 0.00480628s (94.53%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
     1|         1|  1.90735e-05|  1.90735e-05|  0.38%|def laplace_numpy(image):
     2|         0|            0|            0|  0.00%|    """Applies Laplace operator to 2D image using our own NumPy implementation.
     3|         0|            0|            0|  0.00%|    Then tresholds the result and returns boolean image."""
     4|         1|   0.00392222|   0.00392222| 77.14%|    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
     5|         1|  0.000836849|  0.000836849| 16.46%|    thresh = np.abs(laplacian) > 0.05
(call)|         1|   0.00480628|   0.00480628| 94.53%|# :1 laplace_numpy
     6|         1|  2.81334e-05|  2.81334e-05|  0.55%|    return thresh
/Users/kappamaki/anaconda/lib/python3.6/site-packages/pprofile.py:102: UserWarning: Cannot access ".buffer", invalid entities from source files will cause errors when annotating.
  'files will cause errors when annotating.' % (stream, )

pprofile confirms the previous assessment: it's the laplacian computation that takes time.

What's more, pprofile is a little bit simpler to use than line_profiler. However, it doesn't feature a cell magic (although there's already someone asking for this here).

At this point, we know that to make our function faster, the first thing to optimize is the computation of the laplacian. Let's try that using Cython.

Cython

What Cython does, like the other tools we'll see below, is to compile our Python code into C code that will hopefully be very fast. Since Cython has a so-called IPython magic function, we'll be able to continue working in the notebook. If we use the -a option with the Cython magic, we can obtain an annotated version of our source code that shows where Cython uses C code and where it uses Python objects.

In [23]:
%load_ext cython
In [24]:
%%cython -a
import numpy as np
   
def laplace_cython(image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
Out[24]:
Cython: _cython_magic_862e7a2cc089394207c84948a1866a79.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+1: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 2: 
+3: def laplace_cython(image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  __pyx_r = __pyx_pf_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython(__pyx_self, ((PyObject *)__pyx_v_image));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_862e7a2cc089394207c84948a1866a79_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_v_laplacian = NULL;
  PyObject *__pyx_v_thresh = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("_cython_magic_862e7a2cc089394207c84948a1866a79.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__16 = PyTuple_Pack(3, __pyx_n_s_image, __pyx_n_s_laplacian, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__16)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__16);
  __Pyx_GIVEREF(__pyx_tuple__16);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_862e7a2cc089394207c84948a1866a79_1laplace_cython, NULL, __pyx_n_s_cython_magic_862e7a2cc089394207); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 4:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 5:     Cython implementation."""
+6:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
  __pyx_slice_ = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice_)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice_);
  __Pyx_GIVEREF(__pyx_slice_);
  __pyx_slice__2 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__2);
  __Pyx_GIVEREF(__pyx_slice__2);
/* … */
  __pyx_t_1 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_tuple__3 = PyTuple_Pack(2, __pyx_slice_, __pyx_slice__2); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__3);
  __Pyx_GIVEREF(__pyx_tuple__3);
  __pyx_slice__4 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__4)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__4);
  __Pyx_GIVEREF(__pyx_slice__4);
  __pyx_slice__5 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__5)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__5);
  __Pyx_GIVEREF(__pyx_slice__5);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__6); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__6 = PyTuple_Pack(2, __pyx_slice__4, __pyx_slice__5); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__6);
  __Pyx_GIVEREF(__pyx_tuple__6);
  __pyx_slice__7 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__7)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__7);
  __Pyx_GIVEREF(__pyx_slice__7);
  __pyx_slice__8 = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice__8)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__8);
  __Pyx_GIVEREF(__pyx_slice__8);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__9); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Add(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__9 = PyTuple_Pack(2, __pyx_slice__7, __pyx_slice__8); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__9);
  __Pyx_GIVEREF(__pyx_tuple__9);
  __pyx_slice__10 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__10)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__10);
  __Pyx_GIVEREF(__pyx_slice__10);
  __pyx_slice__11 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__11)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__11);
  __Pyx_GIVEREF(__pyx_slice__11);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__12); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__12 = PyTuple_Pack(2, __pyx_slice__10, __pyx_slice__11); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__12);
  __Pyx_GIVEREF(__pyx_tuple__12);
  __pyx_slice__13 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__13)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__13);
  __Pyx_GIVEREF(__pyx_slice__13);
  __pyx_slice__14 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__14)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__14);
  __Pyx_GIVEREF(__pyx_slice__14);
  __pyx_t_2 = PyObject_GetItem(__pyx_v_image, __pyx_tuple__15); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = PyNumber_Subtract(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_laplacian = __pyx_t_2;
  __pyx_t_2 = 0;
  __pyx_tuple__15 = PyTuple_Pack(2, __pyx_slice__13, __pyx_slice__14); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(0, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__15);
  __Pyx_GIVEREF(__pyx_tuple__15);
+7:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_1 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_1)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_1);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_1) {
    __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    {
      __pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;
      __Pyx_INCREF(__pyx_v_laplacian);
      __Pyx_GIVEREF(__pyx_v_laplacian);
      PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_laplacian);
      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+8:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;

What the above code shows is that Cython couldn't optimize our function much, as indicated by the yellow hue of the lines above. The yellow intensity in each line shows the amount of interaction with Python objects that it found. The more you have to use Python objects, the less speedup you are likely to achieve.

Let's time this first compiled version.

In [25]:
%timeit laplace_cython(image)
3.84 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It performs roughly at the same speed that our NumPy version. So there's no much use in switching to Cython. Can we do better? The key is to eliminate interaction with Python through different ways. A first one is to declare our input image is an array.

In [26]:
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In file included from /Users/kappamaki/.ipython/cython/_cython_magic_53222c7a36e00268207d0fde8ad0c365.c:546:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
1 warning generated.
Out[26]:
Cython: _cython_magic_53222c7a36e00268207d0fde8ad0c365.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+1: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 2: cimport numpy as cnp
 3: 
+4: def laplace_cython(cnp.ndarray image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  PyObject *__pyx_v_laplacian = NULL;
  PyObject *__pyx_v_thresh = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("_cython_magic_53222c7a36e00268207d0fde8ad0c365.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__25 = PyTuple_Pack(3, __pyx_n_s_image, __pyx_n_s_laplacian, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__25)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__25);
  __Pyx_GIVEREF(__pyx_tuple__25);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_53222c7a36e00268207d0fde8ad0c365_1laplace_cython, NULL, __pyx_n_s_cython_magic_53222c7a36e0026820); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 5:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 6:     Cython implementation."""
+7:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
  __pyx_slice_ = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice_)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice_);
  __Pyx_GIVEREF(__pyx_slice_);
  __pyx_slice__2 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__2);
  __Pyx_GIVEREF(__pyx_slice__2);
/* … */
  __pyx_t_1 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_tuple__3 = PyTuple_Pack(2, __pyx_slice_, __pyx_slice__2); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__3);
  __Pyx_GIVEREF(__pyx_tuple__3);
  __pyx_slice__4 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__4)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__4);
  __Pyx_GIVEREF(__pyx_slice__4);
  __pyx_slice__5 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__5)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__5);
  __Pyx_GIVEREF(__pyx_slice__5);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__6); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__6 = PyTuple_Pack(2, __pyx_slice__4, __pyx_slice__5); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__6);
  __Pyx_GIVEREF(__pyx_tuple__6);
  __pyx_slice__7 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__7)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__7);
  __Pyx_GIVEREF(__pyx_slice__7);
  __pyx_slice__8 = PySlice_New(Py_None, __pyx_int_neg_2, Py_None); if (unlikely(!__pyx_slice__8)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__8);
  __Pyx_GIVEREF(__pyx_slice__8);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__9); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Add(__pyx_t_3, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__9 = PyTuple_Pack(2, __pyx_slice__7, __pyx_slice__8); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__9);
  __Pyx_GIVEREF(__pyx_tuple__9);
  __pyx_slice__10 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__10)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__10);
  __Pyx_GIVEREF(__pyx_slice__10);
  __pyx_slice__11 = PySlice_New(__pyx_int_2, Py_None, Py_None); if (unlikely(!__pyx_slice__11)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__11);
  __Pyx_GIVEREF(__pyx_slice__11);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__12); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Add(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_tuple__12 = PyTuple_Pack(2, __pyx_slice__10, __pyx_slice__11); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__12);
  __Pyx_GIVEREF(__pyx_tuple__12);
  __pyx_slice__13 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__13)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__13);
  __Pyx_GIVEREF(__pyx_slice__13);
  __pyx_slice__14 = PySlice_New(__pyx_int_1, __pyx_int_neg_1, Py_None); if (unlikely(!__pyx_slice__14)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__14);
  __Pyx_GIVEREF(__pyx_slice__14);
  __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_tuple__15); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = PyNumber_Subtract(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_v_laplacian = __pyx_t_2;
  __pyx_t_2 = 0;
  __pyx_tuple__15 = PyTuple_Pack(2, __pyx_slice__13, __pyx_slice__14); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__15);
  __Pyx_GIVEREF(__pyx_tuple__15);
+8:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_1 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_1)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_1);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_1) {
    __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_1, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    {
      __pyx_t_4 = PyTuple_New(1+1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_GIVEREF(__pyx_t_1); PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1); __pyx_t_1 = NULL;
      __Pyx_INCREF(__pyx_v_laplacian);
      __Pyx_GIVEREF(__pyx_v_laplacian);
      PyTuple_SET_ITEM(__pyx_t_4, 0+1, __pyx_v_laplacian);
      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_4, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 8, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+9:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [27]:
%timeit laplace_cython(image)
3.98 ms ± 62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The timing didn't improve. It turns, if you click on the yellow lines, that Cython is not able to optimize the broadcasting operation (and its fancy bracketing) that we usually perform with NumPy. To allow Cython to optimize, let's write an explicit loop over the pixels in the image.

In [28]:
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    cdef int h = image.shape[0]
    cdef int w = image.shape[1]
    laplacian = np.empty((w-2, h-2))
    cdef int i, j
    for i in range(1, h-1):
        for j in range(1, w-1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In file included from /Users/kappamaki/.ipython/cython/_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0.c:547:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
1 warning generated.
Out[28]:
Cython: _cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: cimport numpy as cnp
 03: 
+04: def laplace_cython(cnp.ndarray image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  int __pyx_v_h;
  int __pyx_v_w;
  PyObject *__pyx_v_laplacian = NULL;
  int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_v_thresh = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_6d19cbef7cff92c3c6b985a9b7119dc0_1laplace_cython, NULL, __pyx_n_s_cython_magic_6d19cbef7cff92c3c6); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 05:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 06:     Cython implementation."""
+07:     cdef int h = image.shape[0]
  __pyx_v_h = (__pyx_v_image->dimensions[0]);
+08:     cdef int w = image.shape[1]
  __pyx_v_w = (__pyx_v_image->dimensions[1]);
+09:     laplacian = np.empty((w-2, h-2))
  __pyx_t_2 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_empty); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_4 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_4);
  __pyx_t_2 = 0;
  __pyx_t_4 = 0;
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_4) {
    __pyx_t_1 = __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_GOTREF(__pyx_t_1);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};
      __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, __pyx_t_5};
      __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    } else
    #endif
    {
      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_2, 0+1, __pyx_t_5);
      __pyx_t_5 = 0;
      __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_v_laplacian = __pyx_t_1;
  __pyx_t_1 = 0;
 10:     cdef int i, j
+11:     for i in range(1, h-1):
  __pyx_t_6 = (__pyx_v_h - 1);
  for (__pyx_t_7 = 1; __pyx_t_7 < __pyx_t_6; __pyx_t_7+=1) {
    __pyx_v_i = __pyx_t_7;
+12:         for j in range(1, w-1):
    __pyx_t_8 = (__pyx_v_w - 1);
    for (__pyx_t_9 = 1; __pyx_t_9 < __pyx_t_8; __pyx_t_9+=1) {
      __pyx_v_j = __pyx_t_9;
+13:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_i - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_3);
      __pyx_t_1 = 0;
      __pyx_t_3 = 0;
      __pyx_t_3 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __pyx_t_2 = __Pyx_PyInt_From_long((__pyx_v_i + 1)); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_2);
      PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
      __pyx_t_2 = 0;
      __pyx_t_1 = 0;
      __pyx_t_1 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __pyx_t_5 = PyNumber_Add(__pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_j - 1)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_2 = PyTuple_New(2); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_3);
      __pyx_t_1 = 0;
      __pyx_t_3 = 0;
      __pyx_t_3 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __pyx_t_2 = PyNumber_Add(__pyx_t_5, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_5 = __Pyx_PyInt_From_long((__pyx_v_j + 1)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __pyx_t_1 = PyTuple_New(2); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_t_5);
      __pyx_t_3 = 0;
      __pyx_t_5 = 0;
      __pyx_t_5 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __pyx_t_1 = PyNumber_Add(__pyx_t_2, __pyx_t_5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __pyx_t_5 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_j); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_2);
      PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_2);
      __pyx_t_5 = 0;
      __pyx_t_2 = 0;
      __pyx_t_2 = PyObject_GetItem(((PyObject *)__pyx_v_image), __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = PyNumber_Multiply(__pyx_int_4, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __pyx_t_2 = PyNumber_Subtract(__pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_i - 1)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_j - 1)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_1);
      __pyx_t_5 = PyTuple_New(2); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_3);
      PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
      __pyx_t_3 = 0;
      __pyx_t_1 = 0;
      if (unlikely(PyObject_SetItem(__pyx_v_laplacian, __pyx_t_5, __pyx_t_2) < 0)) __PYX_ERR(0, 13, __pyx_L1_error)
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
+14:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_5 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_t_5, __pyx_n_s_abs); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_t_5 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_1))) {
    __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_1);
    if (likely(__pyx_t_5)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_1);
      __Pyx_INCREF(__pyx_t_5);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_1, function);
    }
  }
  if (!__pyx_t_5) {
    __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_t_1, __pyx_v_laplacian); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_1)) {
      PyObject *__pyx_temp[2] = {__pyx_t_5, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyFunction_FastCall(__pyx_t_1, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_1)) {
      PyObject *__pyx_temp[2] = {__pyx_t_5, __pyx_v_laplacian};
      __pyx_t_2 = __Pyx_PyCFunction_FastCall(__pyx_t_1, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_GOTREF(__pyx_t_2);
    } else
    #endif
    {
      __pyx_t_3 = PyTuple_New(1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_5); __pyx_t_5 = NULL;
      __Pyx_INCREF(__pyx_v_laplacian);
      __Pyx_GIVEREF(__pyx_v_laplacian);
      PyTuple_SET_ITEM(__pyx_t_3, 0+1, __pyx_v_laplacian);
      __pyx_t_2 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_t_3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = PyObject_RichCompare(__pyx_t_2, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v_thresh = __pyx_t_1;
  __pyx_t_1 = 0;
+15:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [29]:
%timeit laplace_cython(image)
335 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Woops! The function we just wrote is 100x times slower than our NumPy implementation! The reason can be traced to the core of our function: the sum of the five pixel bound terms should be free from any Python object conversions. However, that line is all yellow and if you click it, you'll see it's full of Python object conversions.

How do we solve this problem? The answer is: we need to pass the array as a buffer that can be accessed at C speeds. Let's see:

In [30]:
%%cython -a
import numpy as np
cimport numpy as cnp

def laplace_cython(cnp.ndarray[double, ndim=2] image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    cdef int h = image.shape[0]
    cdef int w = image.shape[1]
    cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
    cdef int i, j
    for i in range(1, h-1):
        for j in range(1, w-1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In file included from /Users/kappamaki/.ipython/cython/_cython_magic_bd5177082c943517a15103961a130b2a.c:548:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
1 warning generated.
Out[30]:
Cython: _cython_magic_bd5177082c943517a15103961a130b2a.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: cimport numpy as cnp
 03: 
+04: def laplace_cython(cnp.ndarray[double, ndim=2] image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 4, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_bd5177082c943517a15103961a130b2a_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  int __pyx_v_h;
  int __pyx_v_w;
  PyArrayObject *__pyx_v_laplacian = 0;
  int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_v_thresh = NULL;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_image;
  __Pyx_Buffer __pyx_pybuffer_image;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_laplacian;
  __Pyx_Buffer __pyx_pybuffer_laplacian;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
  __pyx_pybuffer_laplacian.pybuffer.buf = NULL;
  __pyx_pybuffer_laplacian.refcount = 0;
  __pyx_pybuffernd_laplacian.data = NULL;
  __pyx_pybuffernd_laplacian.rcbuffer = &__pyx_pybuffer_laplacian;
  __pyx_pybuffer_image.pybuffer.buf = NULL;
  __pyx_pybuffer_image.refcount = 0;
  __pyx_pybuffernd_image.data = NULL;
  __pyx_pybuffernd_image.rcbuffer = &__pyx_pybuffer_image;
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_image.rcbuffer->pybuffer, (PyObject*)__pyx_v_image, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) __PYX_ERR(0, 4, __pyx_L1_error)
  }
  __pyx_pybuffernd_image.diminfo[0].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_image.diminfo[0].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_image.diminfo[1].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_image.diminfo[1].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[1];
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  { PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
  __Pyx_AddTraceback("_cython_magic_bd5177082c943517a15103961a130b2a.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  goto __pyx_L2;
  __pyx_L0:;
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __pyx_L2:;
  __Pyx_XDECREF((PyObject *)__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_bd5177082c943517a15103961a130b2a_1laplace_cython, NULL, __pyx_n_s_cython_magic_bd5177082c943517a1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 4, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 05:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 06:     Cython implementation."""
+07:     cdef int h = image.shape[0]
  __pyx_v_h = (__pyx_v_image->dimensions[0]);
+08:     cdef int w = image.shape[1]
  __pyx_v_w = (__pyx_v_image->dimensions[1]);
+09:     cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_3);
  PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3);
  __pyx_t_1 = 0;
  __pyx_t_3 = 0;
  __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_4);
  __pyx_t_4 = 0;
  __pyx_t_4 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_double); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 9, __pyx_L1_error)
  __pyx_t_6 = ((PyArrayObject *)__pyx_t_5);
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {
      __pyx_v_laplacian = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf = NULL;
      __PYX_ERR(0, 9, __pyx_L1_error)
    } else {__pyx_pybuffernd_laplacian.diminfo[0].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_laplacian.diminfo[0].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_laplacian.diminfo[1].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_laplacian.diminfo[1].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[1];
    }
  }
  __pyx_t_6 = 0;
  __pyx_v_laplacian = ((PyArrayObject *)__pyx_t_5);
  __pyx_t_5 = 0;
 10:     cdef int i, j
+11:     for i in range(1, h-1):
  __pyx_t_7 = (__pyx_v_h - 1);
  for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
    __pyx_v_i = __pyx_t_8;
+12:         for j in range(1, w-1):
    __pyx_t_9 = (__pyx_v_w - 1);
    for (__pyx_t_10 = 1; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
      __pyx_v_j = __pyx_t_10;
+13:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_11 = (__pyx_v_i - 1);
      __pyx_t_12 = __pyx_v_j;
      __pyx_t_13 = -1;
      if (__pyx_t_11 < 0) {
        __pyx_t_11 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_11 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_11 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_12 < 0) {
        __pyx_t_12 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_12 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_12 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_14 = (__pyx_v_i + 1);
      __pyx_t_15 = __pyx_v_j;
      __pyx_t_13 = -1;
      if (__pyx_t_14 < 0) {
        __pyx_t_14 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_14 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_14 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_15 < 0) {
        __pyx_t_15 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_15 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_15 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_16 = __pyx_v_i;
      __pyx_t_17 = (__pyx_v_j - 1);
      __pyx_t_13 = -1;
      if (__pyx_t_16 < 0) {
        __pyx_t_16 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_16 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_16 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_17 < 0) {
        __pyx_t_17 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_17 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_17 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_18 = __pyx_v_i;
      __pyx_t_19 = (__pyx_v_j + 1);
      __pyx_t_13 = -1;
      if (__pyx_t_18 < 0) {
        __pyx_t_18 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_18 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_18 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_19 < 0) {
        __pyx_t_19 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_19 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_19 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_20 = __pyx_v_i;
      __pyx_t_21 = __pyx_v_j;
      __pyx_t_13 = -1;
      if (__pyx_t_20 < 0) {
        __pyx_t_20 += __pyx_pybuffernd_image.diminfo[0].shape;
        if (unlikely(__pyx_t_20 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_20 >= __pyx_pybuffernd_image.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_21 < 0) {
        __pyx_t_21 += __pyx_pybuffernd_image.diminfo[1].shape;
        if (unlikely(__pyx_t_21 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_21 >= __pyx_pybuffernd_image.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      __pyx_t_22 = (__pyx_v_i - 1);
      __pyx_t_23 = (__pyx_v_j - 1);
      __pyx_t_13 = -1;
      if (__pyx_t_22 < 0) {
        __pyx_t_22 += __pyx_pybuffernd_laplacian.diminfo[0].shape;
        if (unlikely(__pyx_t_22 < 0)) __pyx_t_13 = 0;
      } else if (unlikely(__pyx_t_22 >= __pyx_pybuffernd_laplacian.diminfo[0].shape)) __pyx_t_13 = 0;
      if (__pyx_t_23 < 0) {
        __pyx_t_23 += __pyx_pybuffernd_laplacian.diminfo[1].shape;
        if (unlikely(__pyx_t_23 < 0)) __pyx_t_13 = 1;
      } else if (unlikely(__pyx_t_23 >= __pyx_pybuffernd_laplacian.diminfo[1].shape)) __pyx_t_13 = 1;
      if (unlikely(__pyx_t_13 != -1)) {
        __Pyx_RaiseBufferIndexError(__pyx_t_13);
        __PYX_ERR(0, 13, __pyx_L1_error)
      }
      *__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf, __pyx_t_22, __pyx_pybuffernd_laplacian.diminfo[0].strides, __pyx_t_23, __pyx_pybuffernd_laplacian.diminfo[1].strides) = (((((*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_12, __pyx_pybuffernd_image.diminfo[1].strides)) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_15, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_16, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_17, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_18, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_19, __pyx_pybuffernd_image.diminfo[1].strides))) - (4.0 * (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_20, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_21, __pyx_pybuffernd_image.diminfo[1].strides))));
    }
  }
+14:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_4) {
    __pyx_t_5 = __Pyx_PyObject_CallOneArg(__pyx_t_3, ((PyObject *)__pyx_v_laplacian)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    {
      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
      __Pyx_INCREF(((PyObject *)__pyx_v_laplacian));
      __Pyx_GIVEREF(((PyObject *)__pyx_v_laplacian));
      PyTuple_SET_ITEM(__pyx_t_2, 0+1, ((PyObject *)__pyx_v_laplacian));
      __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 14, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+15:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [31]:
%timeit laplace_cython(image)
2.37 ms ± 68.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Finally! We have made our function faster by a factor of 2.

But can we go further? Yes, by observing that on the summation line, Cython still performs some bounds checks.

In [32]:
%%cython -a
import numpy as np
cimport numpy as cnp
import cython

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def laplace_cython(cnp.ndarray[double, ndim=2] image):
    """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
    Cython implementation."""
    cdef int h = image.shape[0]
    cdef int w = image.shape[1]
    cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
    cdef int i, j
    for i in range(1, h-1):
        for j in range(1, w-1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In file included from /Users/kappamaki/.ipython/cython/_cython_magic_416c56f791dd3ffab3f99fad86fc10b5.c:549:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1809:
/Users/kappamaki/anaconda/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
1 warning generated.
Out[32]:
Cython: _cython_magic_416c56f791dd3ffab3f99fad86fc10b5.pyx

Generated by Cython 0.27.3

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: import numpy as np
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
/* … */
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: cimport numpy as cnp
 03: import cython
 04: 
 05: @cython.boundscheck(False) # turn off bounds-checking for entire function
 06: @cython.wraparound(False)  # turn off negative index wrapping for entire function
+07: def laplace_cython(cnp.ndarray[double, ndim=2] image):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image); /*proto*/
static char __pyx_doc_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_laplace_cython[] = "Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.\n    Cython implementation.";
static PyMethodDef __pyx_mdef_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython = {"laplace_cython", (PyCFunction)__pyx_pw_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython, METH_O, __pyx_doc_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_laplace_cython};
static PyObject *__pyx_pw_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython(PyObject *__pyx_self, PyObject *__pyx_v_image) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython (wrapper)", 0);
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_image), __pyx_ptype_5numpy_ndarray, 1, "image", 0))) __PYX_ERR(0, 7, __pyx_L1_error)
  __pyx_r = __pyx_pf_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_laplace_cython(__pyx_self, ((PyArrayObject *)__pyx_v_image));

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_laplace_cython(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_image) {
  int __pyx_v_h;
  int __pyx_v_w;
  PyArrayObject *__pyx_v_laplacian = 0;
  int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_v_thresh = NULL;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_image;
  __Pyx_Buffer __pyx_pybuffer_image;
  __Pyx_LocalBuf_ND __pyx_pybuffernd_laplacian;
  __Pyx_Buffer __pyx_pybuffer_laplacian;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("laplace_cython", 0);
  __pyx_pybuffer_laplacian.pybuffer.buf = NULL;
  __pyx_pybuffer_laplacian.refcount = 0;
  __pyx_pybuffernd_laplacian.data = NULL;
  __pyx_pybuffernd_laplacian.rcbuffer = &__pyx_pybuffer_laplacian;
  __pyx_pybuffer_image.pybuffer.buf = NULL;
  __pyx_pybuffer_image.refcount = 0;
  __pyx_pybuffernd_image.data = NULL;
  __pyx_pybuffernd_image.rcbuffer = &__pyx_pybuffer_image;
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_image.rcbuffer->pybuffer, (PyObject*)__pyx_v_image, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 2, 0, __pyx_stack) == -1)) __PYX_ERR(0, 7, __pyx_L1_error)
  }
  __pyx_pybuffernd_image.diminfo[0].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_image.diminfo[0].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_image.diminfo[1].strides = __pyx_pybuffernd_image.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_image.diminfo[1].shape = __pyx_pybuffernd_image.rcbuffer->pybuffer.shape[1];
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  { PyObject *__pyx_type, *__pyx_value, *__pyx_tb;
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
    __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);}
  __Pyx_AddTraceback("_cython_magic_416c56f791dd3ffab3f99fad86fc10b5.laplace_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  goto __pyx_L2;
  __pyx_L0:;
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_image.rcbuffer->pybuffer);
  __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer);
  __pyx_L2:;
  __Pyx_XDECREF((PyObject *)__pyx_v_laplacian);
  __Pyx_XDECREF(__pyx_v_thresh);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_image, __pyx_n_s_h, __pyx_n_s_w, __pyx_n_s_laplacian, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_thresh); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_416c56f791dd3ffab3f99fad86fc10b5_1laplace_cython, NULL, __pyx_n_s_cython_magic_416c56f791dd3ffab3); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_laplace_cython, __pyx_t_1) < 0) __PYX_ERR(0, 7, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 08:     """Applies Laplace operator to 2D image, then tresholds the result and returns boolean image.
 09:     Cython implementation."""
+10:     cdef int h = image.shape[0]
  __pyx_v_h = (__pyx_v_image->dimensions[0]);
+11:     cdef int w = image.shape[1]
  __pyx_v_w = (__pyx_v_image->dimensions[1]);
+12:     cdef cnp.ndarray[double, ndim=2] laplacian = np.empty((w-2, h-2), dtype=np.double)
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyInt_From_long((__pyx_v_w - 2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = __Pyx_PyInt_From_long((__pyx_v_h - 2)); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_3);
  PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_3);
  __pyx_t_1 = 0;
  __pyx_t_3 = 0;
  __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_4);
  __pyx_t_4 = 0;
  __pyx_t_4 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_double); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_4); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 12, __pyx_L1_error)
  __pyx_t_6 = ((PyArrayObject *)__pyx_t_5);
  {
    __Pyx_BufFmt_StackElem __pyx_stack[1];
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_laplacian.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 2, 0, __pyx_stack) == -1)) {
      __pyx_v_laplacian = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf = NULL;
      __PYX_ERR(0, 12, __pyx_L1_error)
    } else {__pyx_pybuffernd_laplacian.diminfo[0].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_laplacian.diminfo[0].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[0]; __pyx_pybuffernd_laplacian.diminfo[1].strides = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.strides[1]; __pyx_pybuffernd_laplacian.diminfo[1].shape = __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.shape[1];
    }
  }
  __pyx_t_6 = 0;
  __pyx_v_laplacian = ((PyArrayObject *)__pyx_t_5);
  __pyx_t_5 = 0;
 13:     cdef int i, j
+14:     for i in range(1, h-1):
  __pyx_t_7 = (__pyx_v_h - 1);
  for (__pyx_t_8 = 1; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
    __pyx_v_i = __pyx_t_8;
+15:         for j in range(1, w-1):
    __pyx_t_9 = (__pyx_v_w - 1);
    for (__pyx_t_10 = 1; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
      __pyx_v_j = __pyx_t_10;
+16:             laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
      __pyx_t_11 = (__pyx_v_i - 1);
      __pyx_t_12 = __pyx_v_j;
      __pyx_t_13 = (__pyx_v_i + 1);
      __pyx_t_14 = __pyx_v_j;
      __pyx_t_15 = __pyx_v_i;
      __pyx_t_16 = (__pyx_v_j - 1);
      __pyx_t_17 = __pyx_v_i;
      __pyx_t_18 = (__pyx_v_j + 1);
      __pyx_t_19 = __pyx_v_i;
      __pyx_t_20 = __pyx_v_j;
      __pyx_t_21 = (__pyx_v_i - 1);
      __pyx_t_22 = (__pyx_v_j - 1);
      *__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_laplacian.rcbuffer->pybuffer.buf, __pyx_t_21, __pyx_pybuffernd_laplacian.diminfo[0].strides, __pyx_t_22, __pyx_pybuffernd_laplacian.diminfo[1].strides) = (((((*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_12, __pyx_pybuffernd_image.diminfo[1].strides)) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_14, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_15, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_16, __pyx_pybuffernd_image.diminfo[1].strides))) + (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_17, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_18, __pyx_pybuffernd_image.diminfo[1].strides))) - (4.0 * (*__Pyx_BufPtrStrided2d(double *, __pyx_pybuffernd_image.rcbuffer->pybuffer.buf, __pyx_t_19, __pyx_pybuffernd_image.diminfo[0].strides, __pyx_t_20, __pyx_pybuffernd_image.diminfo[1].strides))));
    }
  }
+17:     thresh = np.abs(laplacian) > 0.05
  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_abs); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  if (!__pyx_t_4) {
    __pyx_t_5 = __Pyx_PyObject_CallOneArg(__pyx_t_3, ((PyObject *)__pyx_v_laplacian)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
  } else {
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_3)) {
      PyObject *__pyx_temp[2] = {__pyx_t_4, ((PyObject *)__pyx_v_laplacian)};
      __pyx_t_5 = __Pyx_PyCFunction_FastCall(__pyx_t_3, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_GOTREF(__pyx_t_5);
    } else
    #endif
    {
      __pyx_t_2 = PyTuple_New(1+1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __pyx_t_4 = NULL;
      __Pyx_INCREF(((PyObject *)__pyx_v_laplacian));
      __Pyx_GIVEREF(((PyObject *)__pyx_v_laplacian));
      PyTuple_SET_ITEM(__pyx_t_2, 0+1, ((PyObject *)__pyx_v_laplacian));
      __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_3, __pyx_t_2, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = PyObject_RichCompare(__pyx_t_5, __pyx_float_0_05, Py_GT); __Pyx_XGOTREF(__pyx_t_3); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  __pyx_v_thresh = __pyx_t_3;
  __pyx_t_3 = 0;
+18:     return thresh
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_thresh);
  __pyx_r = __pyx_v_thresh;
  goto __pyx_L0;
In [33]:
%timeit laplace_cython(image)
1.58 ms ± 32.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [34]:
np.allclose(laplace_numpy(image), laplace_cython(image))
Out[34]:
True

What we can notice in the above function is that we have gotten rid completely of interaction in the for loop, which leads to a 4x increase in performance over our NumPy implementation.

Let's now turn to pythran.

Pythran

One of the advantages of pythran is that it can optimize higher level numpy code than Cython. Luckily for us again, it features a line magic, so we keep working in the notebook.

In [35]:
%load_ext pythran.magic

To use pythran, all you have to do is to annotate the function you want to export and give it a signature. This is done in a comment line starting the pythran file. One of the promises of pythran is that it can often handle high level broadcasting with NumPy and still optimize our function.

Annotating our NumPy function, we obtain this:

In [36]:
%%pythran
#pythran export laplace_pythran_highlevel(float[][])
import numpy as np
def laplace_pythran_highlevel(image):
    """Laplace operator in NumPy for 2D images. Pythran accelerated."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In [37]:
np.allclose(laplace_numpy(image), laplace_pythran_highlevel(image))
Out[37]:
True
In [39]:
%timeit laplace_pythran_highlevel(image)
368 µs ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Wow! Pythran translated that to the fastest version so far!

But pythran can also translate "low level style" code, just like Cython. Let's try again in a low-level style.

In [40]:
%%pythran 
#pythran export laplace_pythran_lowlevel(float[][])
import numpy as np
def laplace_pythran_lowlevel(image):
    """Laplace operator in NumPy for 2D images. Pythran accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    laplacian = np.empty((h - 2, w - 2))
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In [41]:
np.allclose(laplace_numpy(image), laplace_pythran_lowlevel(image))
Out[41]:
True
In [42]:
%timeit laplace_pythran_lowlevel(image)
795 µs ± 8.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Less fast than before. What if we move the thresholding operation inside the main loop?

In [43]:
%%pythran 
#pythran export laplace_pythran_lowlevel(float[][])
import numpy as np
def laplace_pythran_lowlevel(image):
    """Laplace operator in NumPy for 2D images. Pythran accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    laplacian = np.empty((h - 2, w - 2))
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = np.abs(image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]) > 0.05
    return laplacian
In [44]:
np.allclose(laplace_numpy(image), laplace_pythran_lowlevel(image))
Out[44]:
True
In [45]:
%timeit laplace_pythran_lowlevel(image)
577 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So in this case, pythran with a low level approach outperforms Cython, with the best result being when it's given high-level Numpy code.

Another interesting point here, concerning debugging pythran compilation, is that pythran annotated code is still perfectly valid Python. This helps during the debugging phase, compared to the previously explained Cython approach.

Let's turn to our last tool, numba.

Numba

Numba is very similar to pythran, except that it does just-in-time compilation (JIT). We have several options to for use. Let's try the standard jit approach with a decorator.

In [46]:
from numba import jit
In [47]:
@jit
def laplace_numba(image):
    """Laplace operator in NumPy for 2D images. Accelerated using numba."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh

We call it once to avoid timing issues due to JIT compilation.

In [48]:
laplace_numba(image);
In [49]:
%timeit laplace_numba(image)
2.4 ms ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is rightaway faster than NumPy.

We can also tell numba to avoid using Python at all in its computation. This is called with @jit(nopython=True).

In [50]:
@jit(nopython=True)
def laplace_numba(image):
    """Laplace operator in NumPy for 2D images. Accelerated using numba."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh
In [51]:
laplace_numba(image);
In [52]:
%timeit laplace_numba(image)
2.37 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

No big effect. In fact, we can infer from this that numba managed to generate pure C code from our function and that it did it already previously.

To confirm that, we can either use the inspect_types() method of our jitted function or use the annotation tool provided with numba.

In [53]:
laplace_numba.inspect_types()
laplace_numba (array(float64, 2d, C),)
--------------------------------------------------------------------------------
# File: 
# --- LINE 1 --- 
# label 0
#   del $const0.3
#   del $const0.2
#   del $0.4
#   del $const0.7
#   del $const0.6
#   del $0.8
#   del $0.9
#   del $0.5
#   del $0.10
#   del $const0.14
#   del $const0.13
#   del $0.15
#   del $const0.18
#   del $const0.17
#   del $0.19
#   del $0.20
#   del $0.16
#   del $0.21
#   del $const0.26
#   del $const0.25
#   del $0.27
#   del $const0.30
#   del $const0.29
#   del $0.31
#   del $0.32
#   del $0.28
#   del $0.33
#   del $const0.38
#   del $const0.37
#   del $0.39
#   del $const0.42
#   del $const0.41
#   del $0.43
#   del $0.44
#   del $0.40
#   del $0.45
#   del $const0.51
#   del $const0.50
#   del $0.52
#   del $const0.55
#   del $const0.54
#   del $0.56
#   del $0.57
#   del $0.53
#   del image
#   del $0.58
#   del $const0.48
#   del $0.11
#   del $0.22
#   del $0.34
#   del $0.46
#   del $0.59
#   del $0.61
#   del $0.62
#   del $0.63
#   del laplacian
#   del $const0.66
#   del $0.67
#   del thresh

@jit(nopython=True)

# --- LINE 2 --- 

def laplace_numba(image):

    # --- LINE 3 --- 

    """Laplace operator in NumPy for 2D images. Accelerated using numba."""

    # --- LINE 4 --- 
    #   image = arg(0, name=image)  :: array(float64, 2d, C)
    #   $const0.2 = const(NoneType, None)  :: none
    #   $const0.3 = const(int, -2)  :: int64
    #   $0.4 = global(slice: )  :: Function()
    #   $0.5 = call $0.4($const0.2, $const0.3, func=$0.4, args=(Var($const0.2,  (4)), Var($const0.3,  (4))), kws=(), vararg=None)  :: (none, int64) -> slice
    #   $const0.6 = const(int, 1)  :: int64
    #   $const0.7 = const(int, -1)  :: int64
    #   $0.8 = global(slice: )  :: Function()
    #   $0.9 = call $0.8($const0.6, $const0.7, func=$0.8, args=(Var($const0.6,  (4)), Var($const0.7,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
    #   $0.10 = build_tuple(items=[Var($0.5,  (4)), Var($0.9,  (4))])  :: (slice x 2)
    #   $0.11 = static_getitem(value=image, index=(slice(None, -2, None), slice(1, -1, None)), index_var=$0.10)  :: array(float64, 2d, A)
    #   $const0.13 = const(int, 2)  :: int64
    #   $const0.14 = const(NoneType, None)  :: none
    #   $0.15 = global(slice: )  :: Function()
    #   $0.16 = call $0.15($const0.13, $const0.14, func=$0.15, args=(Var($const0.13,  (4)), Var($const0.14,  (4))), kws=(), vararg=None)  :: (int64, none) -> slice
    #   $const0.17 = const(int, 1)  :: int64
    #   $const0.18 = const(int, -1)  :: int64
    #   $0.19 = global(slice: )  :: Function()
    #   $0.20 = call $0.19($const0.17, $const0.18, func=$0.19, args=(Var($const0.17,  (4)), Var($const0.18,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
    #   $0.21 = build_tuple(items=[Var($0.16,  (4)), Var($0.20,  (4))])  :: (slice x 2)
    #   $0.22 = static_getitem(value=image, index=(slice(2, None, None), slice(1, -1, None)), index_var=$0.21)  :: array(float64, 2d, A)
    #   $const0.25 = const(int, 1)  :: int64
    #   $const0.26 = const(int, -1)  :: int64
    #   $0.27 = global(slice: )  :: Function()
    #   $0.28 = call $0.27($const0.25, $const0.26, func=$0.27, args=(Var($const0.25,  (4)), Var($const0.26,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
    #   $const0.29 = const(NoneType, None)  :: none
    #   $const0.30 = const(int, -2)  :: int64
    #   $0.31 = global(slice: )  :: Function()
    #   $0.32 = call $0.31($const0.29, $const0.30, func=$0.31, args=(Var($const0.29,  (4)), Var($const0.30,  (4))), kws=(), vararg=None)  :: (none, int64) -> slice
    #   $0.33 = build_tuple(items=[Var($0.28,  (4)), Var($0.32,  (4))])  :: (slice x 2)
    #   $0.34 = static_getitem(value=image, index=(slice(1, -1, None), slice(None, -2, None)), index_var=$0.33)  :: array(float64, 2d, A)
    #   $const0.37 = const(int, 1)  :: int64
    #   $const0.38 = const(int, -1)  :: int64
    #   $0.39 = global(slice: )  :: Function()
    #   $0.40 = call $0.39($const0.37, $const0.38, func=$0.39, args=(Var($const0.37,  (4)), Var($const0.38,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
    #   $const0.41 = const(int, 2)  :: int64
    #   $const0.42 = const(NoneType, None)  :: none
    #   $0.43 = global(slice: )  :: Function()
    #   $0.44 = call $0.43($const0.41, $const0.42, func=$0.43, args=(Var($const0.41,  (4)), Var($const0.42,  (4))), kws=(), vararg=None)  :: (int64, none) -> slice
    #   $0.45 = build_tuple(items=[Var($0.40,  (4)), Var($0.44,  (4))])  :: (slice x 2)
    #   $0.46 = static_getitem(value=image, index=(slice(1, -1, None), slice(2, None, None)), index_var=$0.45)  :: array(float64, 2d, A)
    #   $const0.48 = const(int, 4)  :: int64
    #   $const0.50 = const(int, 1)  :: int64
    #   $const0.51 = const(int, -1)  :: int64
    #   $0.52 = global(slice: )  :: Function()
    #   $0.53 = call $0.52($const0.50, $const0.51, func=$0.52, args=(Var($const0.50,  (4)), Var($const0.51,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
    #   $const0.54 = const(int, 1)  :: int64
    #   $const0.55 = const(int, -1)  :: int64
    #   $0.56 = global(slice: )  :: Function()
    #   $0.57 = call $0.56($const0.54, $const0.55, func=$0.56, args=(Var($const0.54,  (4)), Var($const0.55,  (4))), kws=(), vararg=None)  :: (int64, int64) -> slice
    #   $0.58 = build_tuple(items=[Var($0.53,  (4)), Var($0.57,  (4))])  :: (slice x 2)
    #   $0.59 = static_getitem(value=image, index=(slice(1, -1, None), slice(1, -1, None)), index_var=$0.58)  :: array(float64, 2d, A)
    #   $0.61 = arrayexpr(expr=('-', [('+', [('+', [('+', [Var($0.11,  (4)), Var($0.22,  (4))]), Var($0.34,  (4))]), Var($0.46,  (4))]), ('*', [const(int, 4), Var($0.59,  (4))])]), ty=array(float64, 2d, C))  :: array(float64, 2d, C)
    #   laplacian = $0.61  :: array(float64, 2d, C)

    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]

    # --- LINE 5 --- 
    #   $0.62 = global(np: )  :: Module()
    #   $0.63 = getattr(value=$0.62, attr=abs)  :: Function()
    #   $const0.66 = const(float, 0.05)  :: float64
    #   $0.67 = arrayexpr(expr=('>', [(, [Var(laplacian,  (4))]), const(float, 0.05)]), ty=array(bool, 2d, C))  :: array(bool, 2d, C)
    #   thresh = $0.67  :: array(bool, 2d, C)

    thresh = np.abs(laplacian) > 0.05

    # --- LINE 6 --- 
    #   $0.69 = cast(value=thresh)  :: array(bool, 2d, C)
    #   return $0.69

    return thresh


================================================================================

Admittedly, not very helpful. Let's try the annotation tool.

In [54]:
%%file numba_annotation_demo.py
import numpy as np
import skimage.data
import skimage.color
from numba import jit

@jit(nopython=True)
def laplace_numba(image):
    """Laplace operator in NumPy for 2D images. Accelerated using numba."""
    laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
    thresh = np.abs(laplacian) > 0.05
    return thresh

image = skimage.data.astronaut()
image = skimage.color.rgb2gray(image)
laplace_numba(image)
Overwriting numba_annotation_demo.py

We have to run the annotation tool from the command line:

In [55]:
!numba --annotate-html annotation.html numba_annotation_demo.py
In [56]:
from IPython.display import HTML
In [57]:
HTML(data=open('annotation.html').read())
Out[57]:
show numba IR
6: @jit(nopython=True)
7: def laplace_numba(image):
8:     """Laplace operator in NumPy for 2D images. Accelerated using numba."""
9:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
10:     thresh = np.abs(laplacian) > 0.05
11:     return thresh



show numba IR
1: import numpy as np
2: import skimage.data
3: import skimage.color
4: from numba import jit
5:
6: @jit(nopython=True)
7: def laplace_numba(image):
8:     """Laplace operator in NumPy for 2D images. Accelerated using numba."""
9:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
10:     thresh = np.abs(laplacian) > 0.05
11:     return thresh



show numba IR
176: def _broadcast_onto(src_ndim, src_shape, dest_ndim, dest_shape):
177:     '''Low-level utility function used in calculating a shape for
178:     an implicit output array. This function assumes that the
179:     destination shape is an LLVM pointer to a C-style array that was
180:     already initialized to a size of one along all axes.
181:
182:     Returns an integer value:
183:     >= 1 : Succeeded. Return value should equal the number of dimensions in
184:              the destination shape.
185:     0 : Failed to broadcast because source shape is larger than the
186:              destination shape (this case should be weeded out at type
187:              checking).
188:     < 0 : Failed to broadcast onto destination axis, at axis number ==
189:              -(return_value + 1).
190:     '''
191:     if src_ndim > dest_ndim:
192:         # This check should have been done during type checking, but
193:         # let's be defensive anyway...
194:         return 0
195:     else:
196:         src_index = 0
197:         dest_index = dest_ndim - src_ndim
198:         while src_index < src_ndim:
199:             src_dim_size = src_shape[src_index]
200:             dest_dim_size = dest_shape[dest_index]
201:             # Check to see if we've already mutated the destination
202:             # shape along this axis.
203:             if dest_dim_size != 1:
204:                 # If we have mutated the destination shape already,
205:                 # then the source axis size must either be one,
206:                 # or the destination axis size.
207:                 if src_dim_size != dest_dim_size and src_dim_size != 1:
208:                     return -(dest_index + 1)
209:             elif src_dim_size != 1:
210:                 # If the destination size is still its initial
211:                 dest_shape[dest_index] = src_dim_size
212:             src_index += 1
213:             dest_index += 1
214:     return dest_index



show numba IR
1: import numpy as np
2: import skimage.data
3: import skimage.color
4: from numba import jit
5:
6: @jit(nopython=True)
7: def laplace_numba(image):
8:     """Laplace operator in NumPy for 2D images. Accelerated using numba."""
9:     laplacian = image[:-2, 1:-1] + image[2:, 1:-1] + image[1:-1, :-2] + image[1:-1, 2:] - 4*image[1:-1, 1:-1]
10:     thresh = np.abs(laplacian) > 0.05
11:     return thresh



What this shows is that numba managed to correctly optimize the broadcasting (if it hadn't we would have seen colored source code lines).

Let's see also investigate how numba optimizes the low-level approach with explicit for loops.

In [58]:
@jit(nopython=True)
def laplace_numba(image):
    """Laplace operator for 2D images. Numba accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    laplacian = np.empty((h - 2, w - 2))
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = np.abs(image[i-1, j] + image[i+1, j] + image[i, j-1] + image[i, j+1] - 4*image[i, j]) > 0.05
    return laplacian
In [59]:
laplace_numba(image);
In [60]:
%timeit laplace_numba(image)
848 µs ± 191 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This approach improves the performance!

Another way of approaching optimization with numba is to thing in terms of NumPy ufuncs. You can write generalized ufuncs with numba using another decorator, @guvectorize. However, this comes at the cost that we have to think in terms of an ufunc and thus we have to provide an array that gets filled by our computation routine.

In [61]:
from numba import guvectorize
In [62]:
@guvectorize('void(float64[:, :], float64[:, :])', "(m, n)->(m, n)")
def laplace_numba_guvectorize(image, laplacian):
    """Laplace operator in NumPy for 2D images. Numba accelerated."""
    h = image.shape[0]
    w = image.shape[1]
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            laplacian[i-1, j-1] = np.abs(4 * image[i, j] - image[i - 1, j] - image[i + 1, j] - image[i, j + 1] - image[i, j - 1]) > 0.05
In [63]:
laplacian = np.empty_like(image)
In [64]:
laplace_numba_guvectorize(image, laplacian);
In [65]:
np.allclose(laplace_numpy(image), laplacian[:-2, :-2])
Out[65]:
True
In [66]:
%timeit laplace_numba_guvectorize(image, laplacian);
697 µs ± 150 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This approach is even faster. However, it has some limitations: the signature for array sizes forces us to use the same shapes in the input and output. I could'nt use the "(m, n)->(m-1, n-1)" signature. Note that we also could have target parallel execution by using the target=parallel option in the decorator.

Wrap-up and plots

Time for a wrap-up and some plots.

In [67]:
timings = {}
for func in [laplace_skimage, laplace_numpy, laplace_cython, laplace_pythran_highlevel, laplace_pythran_lowlevel, laplace_numba]:
    t = %timeit -o func(image)
    timings[func.__name__] = t
4.62 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.56 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.65 ms ± 57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
354 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
578 µs ± 5.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
635 µs ± 5.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [68]:
t = %timeit -o laplace_numba_guvectorize(image, laplacian);
timings['laplace_numba_guvectorize'] = t
519 µs ± 8.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Here are the table and the plots.

In [69]:
import pandas as pd
In [70]:
pd.Series({key: timings[key].average * 1e6 for key in timings}).to_frame(name='timings (μs)').sort_values(by='timings (μs)')
Out[70]:
timings (μs)
laplace_pythran_highlevel 354.292226
laplace_numba_guvectorize 518.729539
laplace_pythran_lowlevel 578.317114
laplace_numba 635.353504
laplace_cython 1646.003612
laplace_numpy 3564.326801
laplace_skimage 4624.369004
In [71]:
fig, ax = plt.subplots(figsize=(10, 10))
pd.Series({key: timings[key].average * 1e6 for key in timings}).to_frame(name='timings (μs)').sort_values(by='timings (μs)').plot(kind='barh', ax=ax)
Out[71]:

conclusions

So, what have I learned during these last three days? Here are some thoughts:

  • most approaches suggest you rewrite your function in a low-level style if you want a good optimization result
  • most approaches rely on the type annotations which you supply and which are crucial for good results (and sometimes it's easy to forget one variable which in turn prevents the optimizer from doing its best)
  • pythran, like numba, can in theory optimize high-level NumPy code (in our example, pythran did better than numba)
  • pythran yields the best performance in our benchmark
  • Cython, contrary to what I expected, comes in after those two (but still yields a good speed-up)
  • debugging all code generators is usually frustrating, due to difficult to understand error messages, low-level stuff (try reading the Cython output by clicking the yellow lines)
  • numba also allows targeting GPUs, something I haven't spoken about here since I don't have a GPU on my MacBook (this was the topic I had the most difficulty with during the training session since I've never worked with GPUs)

further reading

This post was entirely written using the IPython notebook. Its content is BSD-licensed. You can see a static view or download this notebook with the help of nbviewer at 20170706_OptimizingYourPythonCode.ipynb.

Comments