Optimizing a Function with Cython, Complex Numbers and Parallel Execution
This post was entirely written using the IPython notebook. Its content is BSD-licensed. You can see a static view or download this notebook with the help of nbviewer at 20170731_CythonAndComplexNumbers.ipynb.
Last week, I had the pleasure to dive deep into the Cython world in order to solve a physics problem involving complex numbers. This post describes some of the things I've learnt concerning Cython, complex numbers and parallelization.
What problem am I trying to solve?¶
The problem I was trying to solve was the following: given a vector of complex numbers ($w_i$, "weights"), I wanted to compute its inverse Fourier transform given a certain angular frequency, $\omega$.
So what I want to compute is this:
$$ \mathcal{Re}(\sum_{i=1}^n w_i \exp(i \omega t)) $$
Beware that $t$ is actually a vector.
Reference implementation with NumPy¶
First, let's see our reference implementation. We make use of NumPy broadcasting to effectively perform the summation part in a single operation over all time steps.
import numpy as np
def synthesis(weights, omega, time_vector):
"""Sums weighted complex exponentials."""
return (weights[:, np.newaxis] * np.exp(1j * omega * time_vector[np.newaxis, :])).sum(axis=0).real
Let's apply this on some sample data.
weights = (np.random.randn(1000) + 1j * np.random.randn(1000)) * np.exp(-np.arange(1000))
omega = 2 * np.pi * 1e-3
time_vector = np.arange(2000, dtype=np.float)
ref_result = synthesis(weights, omega, time_vector)
Let's inspect the shape and the dtype of the result :
ref_result.shape
(2000,)
ref_result.dtype
dtype('float64')
This is the expected shape since time step vector is of size 2000. Also, our output is floats, meaning the dtype is as expected. Let's plot the result to have some fun.
import matplotlib.pyplot as plt
plt.style.use('bmh')
%matplotlib inline
plt.plot(time_vector, ref_result)
[<matplotlib.lines.Line2D at 0x83cb9b0>]
Let's move on and port this synthesis function to Cython. We will then time the execution speed and see if Cython allows us to make this function faster.
But first, let's make a detour. To port our function to Cython, we need to be able to do two things:
- use complex number types in Cython
- use the complex exponential "from the C/C++ world"
Prerequesites for Cython & complex numbers¶
Declaring complex numbers in Cython¶
So how do you declare a complex number in Cython? Luckily, the Cython tutorial from Scipy 2017 by Kurt Smith comes to the rescue (specifically, this notebook, but check out the whole tutorial, it's a great learning resource).
%load_ext cython
%%cython -a
# `double complex` is preferred for compatibility with Python's `complex` type:
# https://docs.python.org/3/c-api/complex.html
cdef:
float complex fc = 1+1j
double complex dc = 1+1j
long double complex ldc = 1+1j
print(fc, dc, ldc)
print(fc.real, dc.imag, ldc.conjugate())
(1+1j) (1+1j) (1+1j) 1.0 1.0 (1-1j)
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
01:
02: # `double complex` is preferred for compatibility with Python's `complex` type:
03: # https://docs.python.org/3/c-api/complex.html
04:
05: cdef:
+06: float complex fc = 1+1j
__pyx_t_1 = __Pyx_c_sum_double(__pyx_t_double_complex_from_parts(1, 0), __pyx_t_double_complex_from_parts(0, 1.0)); __pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_fc = __pyx_t_float_complex_from_parts(__Pyx_CREAL(__pyx_t_1), __Pyx_CIMAG(__pyx_t_1));
+07: double complex dc = 1+1j
__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_dc = __Pyx_c_sum_double(__pyx_t_double_complex_from_parts(1, 0), __pyx_t_double_complex_from_parts(0, 1.0));
+08: long double complex ldc = 1+1j
__pyx_t_1 = __Pyx_c_sum_double(__pyx_t_double_complex_from_parts(1, 0), __pyx_t_double_complex_from_parts(0, 1.0)); __pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_ldc = __pyx_t_long_double_complex_from_parts(__Pyx_CREAL(__pyx_t_1), __Pyx_CIMAG(__pyx_t_1));
09:
+10: print(fc, dc, ldc)
__pyx_t_2 = __pyx_PyComplex_FromComplex(__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_fc); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = __pyx_PyComplex_FromComplex(__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_dc); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __pyx_PyComplex_FromComplex(__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_ldc); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = PyTuple_New(3); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_2); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_3); __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_5, 2, __pyx_t_4); __pyx_t_2 = 0; __pyx_t_3 = 0; __pyx_t_4 = 0; __pyx_t_4 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_t_5, NULL); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
+11: print(fc.real, dc.imag, ldc.conjugate())
__pyx_t_4 = PyFloat_FromDouble(__Pyx_CREAL(__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_fc)); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = PyFloat_FromDouble(__Pyx_CIMAG(__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_dc)); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __pyx_t_6 = __Pyx_c_conj_long__double(__pyx_v_46_cython_magic_535820ea99a6bba3370e40ba9afac1e3_ldc); __pyx_t_3 = __pyx_PyComplex_FromComplex(__pyx_t_6); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_2 = PyTuple_New(3); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_4); __Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_5); __Pyx_GIVEREF(__pyx_t_3); PyTuple_SET_ITEM(__pyx_t_2, 2, __pyx_t_3); __pyx_t_4 = 0; __pyx_t_5 = 0; __pyx_t_3 = 0; __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_t_2, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 11, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
Okay, but what about complex numbers in a NumPy array? Well, we can use the memoryview syntax for this.
%%cython -a
def identity(complex [::1] weights):
return weights
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
1:
+2: def identity(complex [::1] weights):
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_100be49ff458200c629e7c893dc56a0d_1identity(PyObject *__pyx_self, PyObject *__pyx_arg_weights); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_100be49ff458200c629e7c893dc56a0d_1identity = {"identity", (PyCFunction)__pyx_pw_46_cython_magic_100be49ff458200c629e7c893dc56a0d_1identity, METH_O, 0}; static PyObject *__pyx_pw_46_cython_magic_100be49ff458200c629e7c893dc56a0d_1identity(PyObject *__pyx_self, PyObject *__pyx_arg_weights) { __Pyx_memviewslice __pyx_v_weights = { 0, 0, { 0 }, { 0 }, { 0 } }; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("identity (wrapper)", 0); assert(__pyx_arg_weights); { __pyx_v_weights = __Pyx_PyObject_to_MemoryviewSlice_dc___pyx_t_double_complex(__pyx_arg_weights); if (unlikely(!__pyx_v_weights.memview)) __PYX_ERR(0, 2, __pyx_L3_error) } goto __pyx_L4_argument_unpacking_done; __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_100be49ff458200c629e7c893dc56a0d.identity", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_46_cython_magic_100be49ff458200c629e7c893dc56a0d_identity(__pyx_self, __pyx_v_weights); /* function exit code */ __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_100be49ff458200c629e7c893dc56a0d_identity(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_weights) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("identity", 0); /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("_cython_magic_100be49ff458200c629e7c893dc56a0d.identity", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __PYX_XDEC_MEMVIEW(&__pyx_v_weights, 1); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__14 = PyTuple_Pack(2, __pyx_n_s_weights, __pyx_n_s_weights); if (unlikely(!__pyx_tuple__14)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__14); __Pyx_GIVEREF(__pyx_tuple__14); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_100be49ff458200c629e7c893dc56a0d_1identity, NULL, __pyx_n_s_cython_magic_100be49ff458200c62); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_identity, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(1, 0, 2, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_C_Users_FL232714_ipython_cython, __pyx_n_s_identity, 2, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) __PYX_ERR(0, 2, __pyx_L1_error)
+3: return weights
__Pyx_XDECREF(__pyx_r); __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_weights, 1, (PyObject *(*)(char *)) __pyx_memview_get___pyx_t_double_complex, (int (*)(char *, PyObject *)) __pyx_memview_set___pyx_t_double_complex, 0);; if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0;
np.allclose(weights, identity(weights))
True
But we can also use the ndarray syntax:
%%cython -a
cimport numpy as np
def identity(np.ndarray [np.complex128_t, ndim=1] weights):
return weights
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+1: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
2:
+3: def identity(np.ndarray [np.complex128_t, ndim=1] weights):
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_d549c2f878d7aebcfe68c88542420167_1identity(PyObject *__pyx_self, PyObject *__pyx_v_weights); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_d549c2f878d7aebcfe68c88542420167_1identity = {"identity", (PyCFunction)__pyx_pw_46_cython_magic_d549c2f878d7aebcfe68c88542420167_1identity, METH_O, 0}; static PyObject *__pyx_pw_46_cython_magic_d549c2f878d7aebcfe68c88542420167_1identity(PyObject *__pyx_self, PyObject *__pyx_v_weights) { PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("identity (wrapper)", 0); if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 3, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_d549c2f878d7aebcfe68c88542420167_identity(__pyx_self, ((PyArrayObject *)__pyx_v_weights)); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_d549c2f878d7aebcfe68c88542420167_identity(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights) { __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("identity", 0); __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 3, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_L1_error:; { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_d549c2f878d7aebcfe68c88542420167.identity", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(1, __pyx_n_s_weights); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_d549c2f878d7aebcfe68c88542420167_1identity, NULL, __pyx_n_s_cython_magic_d549c2f878d7aebcfe); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_identity, __pyx_t_1) < 0) __PYX_ERR(0, 3, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+4: return weights
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_weights)); __pyx_r = ((PyObject *)__pyx_v_weights); goto __pyx_L0;
np.allclose(weights, identity(weights))
True
We will stick to the above syntax (np.ndarray) in what follows.
The complex exponential¶
Another thing we need is to have access to the complex exponential in the Cython code. A naive approach is to use the NumPy exponetial:
%%cython -a
cimport numpy as np
import numpy as np
import cython
@cython.boundscheck(False)
@cython.wraparound(False)
def apply_complex_exp(np.ndarray [np.complex128_t, ndim=1] weights, np.ndarray [np.complex128_t, ndim=1] out):
cdef int i
for i in range(weights.shape[0]):
out[i] = np.exp(1j * weights[i])
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+01: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+02: import numpy as np
__pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
03: import cython
04:
05: @cython.boundscheck(False)
06: @cython.wraparound(False)
+07: def apply_complex_exp(np.ndarray [np.complex128_t, ndim=1] weights, np.ndarray [np.complex128_t, ndim=1] out):
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_1apply_complex_exp(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_1apply_complex_exp = {"apply_complex_exp", (PyCFunction)__pyx_pw_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_1apply_complex_exp, METH_VARARGS|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_1apply_complex_exp(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_weights = 0; PyArrayObject *__pyx_v_out = 0; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("apply_complex_exp (wrapper)", 0); { static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_weights,&__pyx_n_s_out,0}; PyObject* values[2] = {0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_weights)) != 0)) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_out)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("apply_complex_exp", 1, 2, 2, 1); __PYX_ERR(0, 7, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "apply_complex_exp") < 0)) __PYX_ERR(0, 7, __pyx_L3_error) } } else if (PyTuple_GET_SIZE(__pyx_args) != 2) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); } __pyx_v_weights = ((PyArrayObject *)values[0]); __pyx_v_out = ((PyArrayObject *)values[1]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("apply_complex_exp", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 7, __pyx_L3_error) __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_e6c0f8824314565e8d5c73d53395537a.apply_complex_exp", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 7, __pyx_L1_error) if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_out), __pyx_ptype_5numpy_ndarray, 1, "out", 0))) __PYX_ERR(0, 7, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_apply_complex_exp(__pyx_self, __pyx_v_weights, __pyx_v_out); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_apply_complex_exp(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights, PyArrayObject *__pyx_v_out) { int __pyx_v_i; __Pyx_LocalBuf_ND __pyx_pybuffernd_out; __Pyx_Buffer __pyx_pybuffer_out; __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("apply_complex_exp", 0); __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; __pyx_pybuffer_out.pybuffer.buf = NULL; __pyx_pybuffer_out.refcount = 0; __pyx_pybuffernd_out.data = NULL; __pyx_pybuffernd_out.rcbuffer = &__pyx_pybuffer_out; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 7, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_out.rcbuffer->pybuffer, (PyObject*)__pyx_v_out, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 7, __pyx_L1_error) } __pyx_pybuffernd_out.diminfo[0].strides = __pyx_pybuffernd_out.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_out.diminfo[0].shape = __pyx_pybuffernd_out.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); __Pyx_XDECREF(__pyx_t_8); __Pyx_XDECREF(__pyx_t_9); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_e6c0f8824314565e8d5c73d53395537a.apply_complex_exp", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(3, __pyx_n_s_weights, __pyx_n_s_out, __pyx_n_s_i); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_e6c0f8824314565e8d5c73d53395537a_1apply_complex_exp, NULL, __pyx_n_s_cython_magic_e6c0f8824314565e8d); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_apply_complex_exp, __pyx_t_1) < 0) __PYX_ERR(0, 7, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
08: cdef int i
+09: for i in range(weights.shape[0]):
__pyx_t_1 = (__pyx_v_weights->dimensions[0]); for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) { __pyx_v_i = __pyx_t_2;
+10: out[i] = np.exp(1j * weights[i])
__pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_exp); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __pyx_t_6 = __pyx_v_i; __pyx_t_7 = __Pyx_c_prod_double(__pyx_t_double_complex_from_parts(0, 1.0), (*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_weights.rcbuffer->pybuffer.buf, __pyx_t_6, __pyx_pybuffernd_weights.diminfo[0].strides))); __pyx_t_4 = __pyx_PyComplex_FromComplex(__pyx_t_7); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_8 = NULL; if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_5))) { __pyx_t_8 = PyMethod_GET_SELF(__pyx_t_5); if (likely(__pyx_t_8)) { PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5); __Pyx_INCREF(__pyx_t_8); __Pyx_INCREF(function); __Pyx_DECREF_SET(__pyx_t_5, function); } } if (!__pyx_t_8) { __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_t_5, __pyx_t_4); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; __Pyx_GOTREF(__pyx_t_3); } else { #if CYTHON_FAST_PYCALL if (PyFunction_Check(__pyx_t_5)) { PyObject *__pyx_temp[2] = {__pyx_t_8, __pyx_t_4}; __pyx_t_3 = __Pyx_PyFunction_FastCall(__pyx_t_5, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_XDECREF(__pyx_t_8); __pyx_t_8 = 0; __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; } else #endif #if CYTHON_FAST_PYCCALL if (__Pyx_PyFastCFunction_Check(__pyx_t_5)) { PyObject *__pyx_temp[2] = {__pyx_t_8, __pyx_t_4}; __pyx_t_3 = __Pyx_PyCFunction_FastCall(__pyx_t_5, __pyx_temp+1-1, 1+1); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_XDECREF(__pyx_t_8); __pyx_t_8 = 0; __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; } else #endif { __pyx_t_9 = PyTuple_New(1+1); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_9); __Pyx_GIVEREF(__pyx_t_8); PyTuple_SET_ITEM(__pyx_t_9, 0, __pyx_t_8); __pyx_t_8 = NULL; __Pyx_GIVEREF(__pyx_t_4); PyTuple_SET_ITEM(__pyx_t_9, 0+1, __pyx_t_4); __pyx_t_4 = 0; __pyx_t_3 = __Pyx_PyObject_Call(__pyx_t_5, __pyx_t_9, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0; } } __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_7 = __Pyx_PyComplex_As___pyx_t_double_complex(__pyx_t_3); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; __pyx_t_10 = __pyx_v_i; *__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_out.rcbuffer->pybuffer.buf, __pyx_t_10, __pyx_pybuffernd_out.diminfo[0].strides) = __pyx_t_7; }
out = np.empty_like(weights)
out.dtype
dtype('complex128')
apply_complex_exp(weights, out)
Let's check the result:
np.allclose(np.exp(1j * weights), out)
True
However, the problem is that we don't get pure C speed on this operation due to the call to a numpy function. There must be a better way. Searching the internet, I found this StackOverflow thread, which suggests to do the following.
%%cython -a --cplus
cimport numpy as np
import numpy as np
import cython
cdef extern from "complex.h":
double complex exp(double complex)
@cython.boundscheck(False)
@cython.wraparound(False)
def apply_complex_exp2(np.ndarray [np.complex128_t, ndim=1] weights, np.ndarray [np.complex128_t, ndim=1] out):
cdef int i
for i in range(weights.shape[0]):
out[i] = exp(1j * weights[i])
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+01: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+02: import numpy as np
__pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
03: import cython
04:
05: cdef extern from "complex.h":
06: double complex exp(double complex)
07:
08: @cython.boundscheck(False)
09: @cython.wraparound(False)
+10: def apply_complex_exp2(np.ndarray [np.complex128_t, ndim=1] weights, np.ndarray [np.complex128_t, ndim=1] out):
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_1apply_complex_exp2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_1apply_complex_exp2 = {"apply_complex_exp2", (PyCFunction)__pyx_pw_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_1apply_complex_exp2, METH_VARARGS|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_1apply_complex_exp2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_weights = 0; PyArrayObject *__pyx_v_out = 0; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("apply_complex_exp2 (wrapper)", 0); { static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_weights,&__pyx_n_s_out,0}; PyObject* values[2] = {0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_weights)) != 0)) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_out)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("apply_complex_exp2", 1, 2, 2, 1); __PYX_ERR(0, 10, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "apply_complex_exp2") < 0)) __PYX_ERR(0, 10, __pyx_L3_error) } } else if (PyTuple_GET_SIZE(__pyx_args) != 2) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); } __pyx_v_weights = ((PyArrayObject *)values[0]); __pyx_v_out = ((PyArrayObject *)values[1]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("apply_complex_exp2", 1, 2, 2, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 10, __pyx_L3_error) __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_d867f5e67c8f54a1604814be92f853ab.apply_complex_exp2", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 10, __pyx_L1_error) if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_out), __pyx_ptype_5numpy_ndarray, 1, "out", 0))) __PYX_ERR(0, 10, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_apply_complex_exp2(__pyx_self, __pyx_v_weights, __pyx_v_out); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_apply_complex_exp2(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights, PyArrayObject *__pyx_v_out) { int __pyx_v_i; __Pyx_LocalBuf_ND __pyx_pybuffernd_out; __Pyx_Buffer __pyx_pybuffer_out; __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("apply_complex_exp2", 0); __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; __pyx_pybuffer_out.pybuffer.buf = NULL; __pyx_pybuffer_out.refcount = 0; __pyx_pybuffernd_out.data = NULL; __pyx_pybuffernd_out.rcbuffer = &__pyx_pybuffer_out; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 10, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_out.rcbuffer->pybuffer, (PyObject*)__pyx_v_out, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 10, __pyx_L1_error) } __pyx_pybuffernd_out.diminfo[0].strides = __pyx_pybuffernd_out.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_out.diminfo[0].shape = __pyx_pybuffernd_out.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_r = Py_None; __Pyx_INCREF(Py_None); goto __pyx_L0; __pyx_L1_error:; { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_d867f5e67c8f54a1604814be92f853ab.apply_complex_exp2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(3, __pyx_n_s_weights, __pyx_n_s_out, __pyx_n_s_i); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_d867f5e67c8f54a1604814be92f853ab_1apply_complex_exp2, NULL, __pyx_n_s_cython_magic_d867f5e67c8f54a160); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_apply_complex_exp2, __pyx_t_1) < 0) __PYX_ERR(0, 10, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
11: cdef int i
+12: for i in range(weights.shape[0]):
__pyx_t_1 = (__pyx_v_weights->dimensions[0]); for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) { __pyx_v_i = __pyx_t_2;
+13: out[i] = exp(1j * weights[i])
__pyx_t_3 = __pyx_v_i; __pyx_t_4 = __pyx_v_i; *__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_out.rcbuffer->pybuffer.buf, __pyx_t_4, __pyx_pybuffernd_out.diminfo[0].strides) = exp(__Pyx_c_prod_double(__pyx_t_double_complex_from_parts(0, 1.0), (*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_weights.rcbuffer->pybuffer.buf, __pyx_t_3, __pyx_pybuffernd_weights.diminfo[0].strides)))); }
This allows us translating to pure C... wait-for-it... ++! If you look closely at the above code cell, I had to add the --cplus
flag to the compiler since the extern definition we are referring to is the built-in header for the a file that on my computer can be found in Anaconda\Lib\site-packages\Cython\Includes\libcpp
! If I don't add this, I get an error from the compiler...
Let's check that this returns the right result.
apply_complex_exp2(weights, out)
np.allclose(np.exp(1j * weights), out)
True
Let's also check that this is faster:
%timeit apply_complex_exp(weights, out)
1.24 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit apply_complex_exp2(weights, out)
40.8 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Indeed, the second approach that uses C++ is much faster!
With these things in mind, let's now write the Cython version of our reference implementation.
Cython optimization of the synthesis function¶
%%cython -a --cplus
cimport numpy as np
import numpy as np
import cython
cdef extern from "<complex.h>" namespace "std" nogil:
double complex exp(double complex z)
double real(double complex z)
cdef double complex I = 1j
@cython.boundscheck(False)
@cython.wraparound(False)
def synthesis_cython1(np.ndarray [np.complex128_t, ndim=1] weights,
double omega,
np.ndarray [np.float64_t, ndim=1] time_vector):
cdef int i, j
cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
cdef double complex temp_sum
for i in range(time_vector.shape[0]):
temp_sum = 0
for j in range(weights.shape[0]):
temp_sum += weights[j] * exp(I * time_vector[i] * omega )
out[i] = real(temp_sum)
return out
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+01: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+02: import numpy as np
__pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
03: import cython
04:
05: cdef extern from "<complex.h>" namespace "std" nogil:
06: double complex exp(double complex z)
07: double real(double complex z)
08:
+09: cdef double complex I = 1j
__pyx_v_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_I = __pyx_t_double_complex_from_parts(0, 1.0);
10:
11: @cython.boundscheck(False)
12: @cython.wraparound(False)
+13: def synthesis_cython1(np.ndarray [np.complex128_t, ndim=1] weights,
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_1synthesis_cython1(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_1synthesis_cython1 = {"synthesis_cython1", (PyCFunction)__pyx_pw_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_1synthesis_cython1, METH_VARARGS|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_1synthesis_cython1(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_weights = 0; double __pyx_v_omega; PyArrayObject *__pyx_v_time_vector = 0; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython1 (wrapper)", 0); { static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_weights,&__pyx_n_s_omega,&__pyx_n_s_time_vector,0}; PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_weights)) != 0)) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_omega)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython1", 1, 3, 3, 1); __PYX_ERR(0, 13, __pyx_L3_error) } case 2: if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_time_vector)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython1", 1, 3, 3, 2); __PYX_ERR(0, 13, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "synthesis_cython1") < 0)) __PYX_ERR(0, 13, __pyx_L3_error) } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_weights = ((PyArrayObject *)values[0]); __pyx_v_omega = __pyx_PyFloat_AsDouble(values[1]); if (unlikely((__pyx_v_omega == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 14, __pyx_L3_error) __pyx_v_time_vector = ((PyArrayObject *)values[2]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("synthesis_cython1", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 13, __pyx_L3_error) __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_180265c64445be1e3c6d6a7fae5a6faf.synthesis_cython1", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 13, __pyx_L1_error) if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_time_vector), __pyx_ptype_5numpy_ndarray, 1, "time_vector", 0))) __PYX_ERR(0, 15, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_synthesis_cython1(__pyx_self, __pyx_v_weights, __pyx_v_omega, __pyx_v_time_vector); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_synthesis_cython1(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights, double __pyx_v_omega, PyArrayObject *__pyx_v_time_vector) { int __pyx_v_i; int __pyx_v_j; PyArrayObject *__pyx_v_out = 0; __pyx_t_double_complex __pyx_v_temp_sum; __Pyx_LocalBuf_ND __pyx_pybuffernd_out; __Pyx_Buffer __pyx_pybuffer_out; __Pyx_LocalBuf_ND __pyx_pybuffernd_time_vector; __Pyx_Buffer __pyx_pybuffer_time_vector; __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython1", 0); __pyx_pybuffer_out.pybuffer.buf = NULL; __pyx_pybuffer_out.refcount = 0; __pyx_pybuffernd_out.data = NULL; __pyx_pybuffernd_out.rcbuffer = &__pyx_pybuffer_out; __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; __pyx_pybuffer_time_vector.pybuffer.buf = NULL; __pyx_pybuffer_time_vector.refcount = 0; __pyx_pybuffernd_time_vector.data = NULL; __pyx_pybuffernd_time_vector.rcbuffer = &__pyx_pybuffer_time_vector; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 13, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer, (PyObject*)__pyx_v_time_vector, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 13, __pyx_L1_error) } __pyx_pybuffernd_time_vector.diminfo[0].strides = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_time_vector.diminfo[0].shape = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_180265c64445be1e3c6d6a7fae5a6faf.synthesis_cython1", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_out); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(7, __pyx_n_s_weights, __pyx_n_s_omega, __pyx_n_s_time_vector, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_out, __pyx_n_s_temp_sum); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_1synthesis_cython1, NULL, __pyx_n_s_cython_magic_180265c64445be1e3c); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_synthesis_cython1, __pyx_t_1) < 0) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
14: double omega,
15: np.ndarray [np.float64_t, ndim=1] time_vector):
16: cdef int i, j
+17: cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_INCREF(((PyObject *)__pyx_v_time_vector)); __Pyx_GIVEREF(((PyObject *)__pyx_v_time_vector)); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_v_time_vector)); __pyx_t_3 = PyDict_New(); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_float); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (PyDict_SetItem(__pyx_t_3, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 17, __pyx_L1_error) __pyx_t_6 = ((PyArrayObject *)__pyx_t_5); { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_out.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_out = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_out.rcbuffer->pybuffer.buf = NULL; __PYX_ERR(0, 17, __pyx_L1_error) } else {__pyx_pybuffernd_out.diminfo[0].strides = __pyx_pybuffernd_out.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_out.diminfo[0].shape = __pyx_pybuffernd_out.rcbuffer->pybuffer.shape[0]; } } __pyx_t_6 = 0; __pyx_v_out = ((PyArrayObject *)__pyx_t_5); __pyx_t_5 = 0;
18: cdef double complex temp_sum
+19: for i in range(time_vector.shape[0]):
__pyx_t_7 = (__pyx_v_time_vector->dimensions[0]); for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) { __pyx_v_i = __pyx_t_8;
+20: temp_sum = 0
__pyx_v_temp_sum = __pyx_t_double_complex_from_parts(0, 0);
+21: for j in range(weights.shape[0]):
__pyx_t_9 = (__pyx_v_weights->dimensions[0]); for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) { __pyx_v_j = __pyx_t_10;
+22: temp_sum += weights[j] * exp(I * time_vector[i] * omega )
__pyx_t_11 = __pyx_v_j; __pyx_t_12 = __pyx_v_i; __pyx_t_13 = __Pyx_c_prod_npy_float64(__Pyx_c_prod_npy_float64(__pyx_t_npy_float64_complex_from_parts(__Pyx_CREAL(__pyx_v_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_I), __Pyx_CIMAG(__pyx_v_46_cython_magic_180265c64445be1e3c6d6a7fae5a6faf_I)), __pyx_t_npy_float64_complex_from_parts((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.buf, __pyx_t_12, __pyx_pybuffernd_time_vector.diminfo[0].strides)), 0)), __pyx_t_npy_float64_complex_from_parts(__pyx_v_omega, 0)); __pyx_v_temp_sum = __Pyx_c_sum_double(__pyx_v_temp_sum, __Pyx_c_prod_double((*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_weights.rcbuffer->pybuffer.buf, __pyx_t_11, __pyx_pybuffernd_weights.diminfo[0].strides)), std::exp(__pyx_t_double_complex_from_parts(__Pyx_CREAL(__pyx_t_13), __Pyx_CIMAG(__pyx_t_13))))); }
+23: out[i] = real(temp_sum)
__pyx_t_14 = __pyx_v_i; *__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_out.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_out.diminfo[0].strides) = std::real(__pyx_v_temp_sum); }
+24: return out
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_out)); __pyx_r = ((PyObject *)__pyx_v_out); goto __pyx_L0;
out = synthesis_cython1(weights, omega, time_vector)
np.allclose(ref_result, out)
True
plt.plot(ref_result)
plt.plot(out)
[<matplotlib.lines.Line2D at 0x8ae6a90>]
Let's now do some timings.
%timeit synthesis(weights, omega, time_vector)
30.1 ms ± 625 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit synthesis_cython1(weights, omega, time_vector)
127 ms ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
As we can see, our Cython version is almost ten times slower than our vectorized implementation. Let's try tweaking it.
%%cython -a --cplus
cimport numpy as np
import numpy as np
import cython
cdef extern from "<complex.h>" namespace "std" nogil:
double complex exp(double complex z)
double real(double complex z)
cdef double complex I = 1j
@cython.boundscheck(False)
@cython.wraparound(False)
def synthesis_cython2(np.ndarray [np.complex128_t, ndim=1] weights,
double omega,
np.ndarray [np.float64_t, ndim=1] time_vector):
cdef int i, j
cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
cdef double complex temp_sum
cdef double complex temp_mult
for i in range(time_vector.shape[0]):
temp_sum = 0
temp_mult = exp(I * time_vector[i] * omega)
for j in range(weights.shape[0]):
temp_sum += weights[j] * temp_mult
out[i] = real(temp_sum)
return out
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+01: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+02: import numpy as np
__pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
03: import cython
04:
05: cdef extern from "<complex.h>" namespace "std" nogil:
06: double complex exp(double complex z)
07: double real(double complex z)
08:
+09: cdef double complex I = 1j
__pyx_v_46_cython_magic_59850e01233c87321a5c3910c7be7295_I = __pyx_t_double_complex_from_parts(0, 1.0);
10:
11: @cython.boundscheck(False)
12: @cython.wraparound(False)
+13: def synthesis_cython2(np.ndarray [np.complex128_t, ndim=1] weights,
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_59850e01233c87321a5c3910c7be7295_1synthesis_cython2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_59850e01233c87321a5c3910c7be7295_1synthesis_cython2 = {"synthesis_cython2", (PyCFunction)__pyx_pw_46_cython_magic_59850e01233c87321a5c3910c7be7295_1synthesis_cython2, METH_VARARGS|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_46_cython_magic_59850e01233c87321a5c3910c7be7295_1synthesis_cython2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_weights = 0; double __pyx_v_omega; PyArrayObject *__pyx_v_time_vector = 0; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython2 (wrapper)", 0); { static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_weights,&__pyx_n_s_omega,&__pyx_n_s_time_vector,0}; PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_weights)) != 0)) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_omega)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython2", 1, 3, 3, 1); __PYX_ERR(0, 13, __pyx_L3_error) } case 2: if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_time_vector)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython2", 1, 3, 3, 2); __PYX_ERR(0, 13, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "synthesis_cython2") < 0)) __PYX_ERR(0, 13, __pyx_L3_error) } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_weights = ((PyArrayObject *)values[0]); __pyx_v_omega = __pyx_PyFloat_AsDouble(values[1]); if (unlikely((__pyx_v_omega == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 14, __pyx_L3_error) __pyx_v_time_vector = ((PyArrayObject *)values[2]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("synthesis_cython2", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 13, __pyx_L3_error) __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_59850e01233c87321a5c3910c7be7295.synthesis_cython2", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 13, __pyx_L1_error) if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_time_vector), __pyx_ptype_5numpy_ndarray, 1, "time_vector", 0))) __PYX_ERR(0, 15, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_59850e01233c87321a5c3910c7be7295_synthesis_cython2(__pyx_self, __pyx_v_weights, __pyx_v_omega, __pyx_v_time_vector); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_59850e01233c87321a5c3910c7be7295_synthesis_cython2(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights, double __pyx_v_omega, PyArrayObject *__pyx_v_time_vector) { int __pyx_v_i; int __pyx_v_j; PyArrayObject *__pyx_v_out = 0; __pyx_t_double_complex __pyx_v_temp_sum; __pyx_t_double_complex __pyx_v_temp_mult; __Pyx_LocalBuf_ND __pyx_pybuffernd_out; __Pyx_Buffer __pyx_pybuffer_out; __Pyx_LocalBuf_ND __pyx_pybuffernd_time_vector; __Pyx_Buffer __pyx_pybuffer_time_vector; __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython2", 0); __pyx_pybuffer_out.pybuffer.buf = NULL; __pyx_pybuffer_out.refcount = 0; __pyx_pybuffernd_out.data = NULL; __pyx_pybuffernd_out.rcbuffer = &__pyx_pybuffer_out; __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; __pyx_pybuffer_time_vector.pybuffer.buf = NULL; __pyx_pybuffer_time_vector.refcount = 0; __pyx_pybuffernd_time_vector.data = NULL; __pyx_pybuffernd_time_vector.rcbuffer = &__pyx_pybuffer_time_vector; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 13, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer, (PyObject*)__pyx_v_time_vector, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 13, __pyx_L1_error) } __pyx_pybuffernd_time_vector.diminfo[0].strides = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_time_vector.diminfo[0].shape = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_59850e01233c87321a5c3910c7be7295.synthesis_cython2", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_out); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(8, __pyx_n_s_weights, __pyx_n_s_omega, __pyx_n_s_time_vector, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_out, __pyx_n_s_temp_sum, __pyx_n_s_temp_mult); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_59850e01233c87321a5c3910c7be7295_1synthesis_cython2, NULL, __pyx_n_s_cython_magic_59850e01233c87321a); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_synthesis_cython2, __pyx_t_1) < 0) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
14: double omega,
15: np.ndarray [np.float64_t, ndim=1] time_vector):
16: cdef int i, j
+17: cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_INCREF(((PyObject *)__pyx_v_time_vector)); __Pyx_GIVEREF(((PyObject *)__pyx_v_time_vector)); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_v_time_vector)); __pyx_t_3 = PyDict_New(); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_float); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (PyDict_SetItem(__pyx_t_3, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 17, __pyx_L1_error) __pyx_t_6 = ((PyArrayObject *)__pyx_t_5); { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_out.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_out = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_out.rcbuffer->pybuffer.buf = NULL; __PYX_ERR(0, 17, __pyx_L1_error) } else {__pyx_pybuffernd_out.diminfo[0].strides = __pyx_pybuffernd_out.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_out.diminfo[0].shape = __pyx_pybuffernd_out.rcbuffer->pybuffer.shape[0]; } } __pyx_t_6 = 0; __pyx_v_out = ((PyArrayObject *)__pyx_t_5); __pyx_t_5 = 0;
18: cdef double complex temp_sum
19: cdef double complex temp_mult
+20: for i in range(time_vector.shape[0]):
__pyx_t_7 = (__pyx_v_time_vector->dimensions[0]); for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) { __pyx_v_i = __pyx_t_8;
+21: temp_sum = 0
__pyx_v_temp_sum = __pyx_t_double_complex_from_parts(0, 0);
+22: temp_mult = exp(I * time_vector[i] * omega)
__pyx_t_9 = __pyx_v_i; __pyx_t_10 = __Pyx_c_prod_npy_float64(__Pyx_c_prod_npy_float64(__pyx_t_npy_float64_complex_from_parts(__Pyx_CREAL(__pyx_v_46_cython_magic_59850e01233c87321a5c3910c7be7295_I), __Pyx_CIMAG(__pyx_v_46_cython_magic_59850e01233c87321a5c3910c7be7295_I)), __pyx_t_npy_float64_complex_from_parts((*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_time_vector.diminfo[0].strides)), 0)), __pyx_t_npy_float64_complex_from_parts(__pyx_v_omega, 0)); __pyx_v_temp_mult = std::exp(__pyx_t_double_complex_from_parts(__Pyx_CREAL(__pyx_t_10), __Pyx_CIMAG(__pyx_t_10)));
+23: for j in range(weights.shape[0]):
__pyx_t_11 = (__pyx_v_weights->dimensions[0]); for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) { __pyx_v_j = __pyx_t_12;
+24: temp_sum += weights[j] * temp_mult
__pyx_t_13 = __pyx_v_j; __pyx_v_temp_sum = __Pyx_c_sum_double(__pyx_v_temp_sum, __Pyx_c_prod_double((*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_weights.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_weights.diminfo[0].strides)), __pyx_v_temp_mult)); }
+25: out[i] = real(temp_sum)
__pyx_t_14 = __pyx_v_i; *__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_out.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_out.diminfo[0].strides) = std::real(__pyx_v_temp_sum); }
+26: return out
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_out)); __pyx_r = ((PyObject *)__pyx_v_out); goto __pyx_L0;
Let's check it's correct again:
out = synthesis_cython2(weights, omega, time_vector)
np.allclose(ref_result, out)
True
%timeit synthesis(weights, omega, time_vector)
29.7 ms ± 573 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit synthesis_cython2(weights, omega, time_vector)
27.6 ms ± 392 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Nice! Moving the computation of the complex exponent outside of the loop helped us match NumPy. As you can see above however, line 22 is still yellow, meaning there is some Python interaction going on. Can we eliminate that?
%%cython -a --cplus
cimport numpy as np
import numpy as np
import cython
cdef extern from "<complex.h>" namespace "std" nogil:
double complex exp(double complex z)
double real(double complex z)
cdef double complex I = 1j
@cython.boundscheck(False)
@cython.wraparound(False)
def synthesis_cython3(np.ndarray [complex, ndim=1] weights,
double omega,
np.ndarray [double, ndim=1] time_vector):
cdef int i, j
cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
cdef double complex temp_sum
cdef double complex temp_mult
for i in range(time_vector.shape[0]):
temp_sum = 0
temp_mult = exp(I * time_vector[i] * omega)
for j in range(weights.shape[0]):
temp_sum += weights[j] * temp_mult
out[i] = real(temp_sum)
return out
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+01: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+02: import numpy as np
__pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
03: import cython
04:
05: cdef extern from "<complex.h>" namespace "std" nogil:
06: double complex exp(double complex z)
07: double real(double complex z)
08:
+09: cdef double complex I = 1j
__pyx_v_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_I = __pyx_t_double_complex_from_parts(0, 1.0);
10:
11: @cython.boundscheck(False)
12: @cython.wraparound(False)
+13: def synthesis_cython3(np.ndarray [complex, ndim=1] weights,
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_1synthesis_cython3(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_1synthesis_cython3 = {"synthesis_cython3", (PyCFunction)__pyx_pw_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_1synthesis_cython3, METH_VARARGS|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_1synthesis_cython3(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_weights = 0; double __pyx_v_omega; PyArrayObject *__pyx_v_time_vector = 0; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython3 (wrapper)", 0); { static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_weights,&__pyx_n_s_omega,&__pyx_n_s_time_vector,0}; PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_weights)) != 0)) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_omega)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython3", 1, 3, 3, 1); __PYX_ERR(0, 13, __pyx_L3_error) } case 2: if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_time_vector)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython3", 1, 3, 3, 2); __PYX_ERR(0, 13, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "synthesis_cython3") < 0)) __PYX_ERR(0, 13, __pyx_L3_error) } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_weights = ((PyArrayObject *)values[0]); __pyx_v_omega = __pyx_PyFloat_AsDouble(values[1]); if (unlikely((__pyx_v_omega == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 14, __pyx_L3_error) __pyx_v_time_vector = ((PyArrayObject *)values[2]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("synthesis_cython3", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 13, __pyx_L3_error) __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_9686ecbb26e36aaeeb31076454f5407d.synthesis_cython3", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 13, __pyx_L1_error) if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_time_vector), __pyx_ptype_5numpy_ndarray, 1, "time_vector", 0))) __PYX_ERR(0, 15, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_synthesis_cython3(__pyx_self, __pyx_v_weights, __pyx_v_omega, __pyx_v_time_vector); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_synthesis_cython3(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights, double __pyx_v_omega, PyArrayObject *__pyx_v_time_vector) { int __pyx_v_i; int __pyx_v_j; PyArrayObject *__pyx_v_out = 0; __pyx_t_double_complex __pyx_v_temp_sum; __pyx_t_double_complex __pyx_v_temp_mult; __Pyx_LocalBuf_ND __pyx_pybuffernd_out; __Pyx_Buffer __pyx_pybuffer_out; __Pyx_LocalBuf_ND __pyx_pybuffernd_time_vector; __Pyx_Buffer __pyx_pybuffer_time_vector; __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython3", 0); __pyx_pybuffer_out.pybuffer.buf = NULL; __pyx_pybuffer_out.refcount = 0; __pyx_pybuffernd_out.data = NULL; __pyx_pybuffernd_out.rcbuffer = &__pyx_pybuffer_out; __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; __pyx_pybuffer_time_vector.pybuffer.buf = NULL; __pyx_pybuffer_time_vector.refcount = 0; __pyx_pybuffernd_time_vector.data = NULL; __pyx_pybuffernd_time_vector.rcbuffer = &__pyx_pybuffer_time_vector; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 13, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer, (PyObject*)__pyx_v_time_vector, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 13, __pyx_L1_error) } __pyx_pybuffernd_time_vector.diminfo[0].strides = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_time_vector.diminfo[0].shape = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_9686ecbb26e36aaeeb31076454f5407d.synthesis_cython3", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_out); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(8, __pyx_n_s_weights, __pyx_n_s_omega, __pyx_n_s_time_vector, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_out, __pyx_n_s_temp_sum, __pyx_n_s_temp_mult); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_1synthesis_cython3, NULL, __pyx_n_s_cython_magic_9686ecbb26e36aaeeb); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_synthesis_cython3, __pyx_t_1) < 0) __PYX_ERR(0, 13, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
14: double omega,
15: np.ndarray [double, ndim=1] time_vector):
16: cdef int i, j
+17: cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_INCREF(((PyObject *)__pyx_v_time_vector)); __Pyx_GIVEREF(((PyObject *)__pyx_v_time_vector)); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_v_time_vector)); __pyx_t_3 = PyDict_New(); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_float); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (PyDict_SetItem(__pyx_t_3, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 17, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 17, __pyx_L1_error) __pyx_t_6 = ((PyArrayObject *)__pyx_t_5); { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_out.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_out = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_out.rcbuffer->pybuffer.buf = NULL; __PYX_ERR(0, 17, __pyx_L1_error) } else {__pyx_pybuffernd_out.diminfo[0].strides = __pyx_pybuffernd_out.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_out.diminfo[0].shape = __pyx_pybuffernd_out.rcbuffer->pybuffer.shape[0]; } } __pyx_t_6 = 0; __pyx_v_out = ((PyArrayObject *)__pyx_t_5); __pyx_t_5 = 0;
18: cdef double complex temp_sum
19: cdef double complex temp_mult
+20: for i in range(time_vector.shape[0]):
__pyx_t_7 = (__pyx_v_time_vector->dimensions[0]); for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) { __pyx_v_i = __pyx_t_8;
+21: temp_sum = 0
__pyx_v_temp_sum = __pyx_t_double_complex_from_parts(0, 0);
+22: temp_mult = exp(I * time_vector[i] * omega)
__pyx_t_9 = __pyx_v_i; __pyx_v_temp_mult = std::exp(__Pyx_c_prod_double(__Pyx_c_prod_double(__pyx_v_46_cython_magic_9686ecbb26e36aaeeb31076454f5407d_I, __pyx_t_double_complex_from_parts((*__Pyx_BufPtrStrided1d(double *, __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.buf, __pyx_t_9, __pyx_pybuffernd_time_vector.diminfo[0].strides)), 0)), __pyx_t_double_complex_from_parts(__pyx_v_omega, 0)));
+23: for j in range(weights.shape[0]):
__pyx_t_10 = (__pyx_v_weights->dimensions[0]); for (__pyx_t_11 = 0; __pyx_t_11 < __pyx_t_10; __pyx_t_11+=1) { __pyx_v_j = __pyx_t_11;
+24: temp_sum += weights[j] * temp_mult
__pyx_t_12 = __pyx_v_j; __pyx_v_temp_sum = __Pyx_c_sum_double(__pyx_v_temp_sum, __Pyx_c_prod_double((*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_weights.rcbuffer->pybuffer.buf, __pyx_t_12, __pyx_pybuffernd_weights.diminfo[0].strides)), __pyx_v_temp_mult)); }
+25: out[i] = real(temp_sum)
__pyx_t_13 = __pyx_v_i; *__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_out.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_out.diminfo[0].strides) = std::real(__pyx_v_temp_sum); }
+26: return out
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_out)); __pyx_r = ((PyObject *)__pyx_v_out); goto __pyx_L0;
Yes we can! All we did was changing the dtype of the input array. Let's check our function produces correct results again:
out = synthesis_cython3(weights, omega, time_vector)
np.allclose(ref_result, out)
True
%timeit synthesis(weights, omega, time_vector)
29.9 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit synthesis_cython3(weights, omega, time_vector)
28.1 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Interestingly, even though our function is now "completely white", meaning that Cython has translated it to pure C++, it is not actually faster than our previous version. The last thing we can do in our quest for performance is to make the loop parallel.
%%cython -a --cplus --compile-args=/openmp
cimport numpy as np
import numpy as np
import cython
from cython.parallel import prange
cdef extern from "<complex.h>" namespace "std" nogil:
double complex exp(double complex z)
double real(double complex z)
cdef double complex I = 1j
@cython.boundscheck(False)
@cython.wraparound(False)
def synthesis_cython4(np.ndarray [complex, ndim=1] weights,
double omega,
np.ndarray [double, ndim=1] time_vector):
cdef int i, j
cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
cdef double complex temp_sum
cdef double complex temp_mult
for i in prange(time_vector.shape[0], nogil=True):
temp_sum = 0
temp_mult = exp(I * time_vector[i] * omega)
for j in range(weights.shape[0]):
temp_sum = temp_sum + weights[j] * temp_mult
out[i] = real(temp_sum)
return out
Generated by Cython 0.25.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
+01: cimport numpy as np
__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+02: import numpy as np
__pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
03: import cython
04: from cython.parallel import prange
05:
06: cdef extern from "<complex.h>" namespace "std" nogil:
07: double complex exp(double complex z)
08: double real(double complex z)
09:
+10: cdef double complex I = 1j
__pyx_v_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_I = __pyx_t_double_complex_from_parts(0, 1.0);
11:
12: @cython.boundscheck(False)
13: @cython.wraparound(False)
+14: def synthesis_cython4(np.ndarray [complex, ndim=1] weights,
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_1synthesis_cython4(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_1synthesis_cython4 = {"synthesis_cython4", (PyCFunction)__pyx_pw_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_1synthesis_cython4, METH_VARARGS|METH_KEYWORDS, 0}; static PyObject *__pyx_pw_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_1synthesis_cython4(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { PyArrayObject *__pyx_v_weights = 0; double __pyx_v_omega; PyArrayObject *__pyx_v_time_vector = 0; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython4 (wrapper)", 0); { static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_weights,&__pyx_n_s_omega,&__pyx_n_s_time_vector,0}; PyObject* values[3] = {0,0,0}; if (unlikely(__pyx_kwds)) { Py_ssize_t kw_args; const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args); switch (pos_args) { case 3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2); case 2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1); case 1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0); case 0: break; default: goto __pyx_L5_argtuple_error; } kw_args = PyDict_Size(__pyx_kwds); switch (pos_args) { case 0: if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_weights)) != 0)) kw_args--; else goto __pyx_L5_argtuple_error; case 1: if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_omega)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython4", 1, 3, 3, 1); __PYX_ERR(0, 14, __pyx_L3_error) } case 2: if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_time_vector)) != 0)) kw_args--; else { __Pyx_RaiseArgtupleInvalid("synthesis_cython4", 1, 3, 3, 2); __PYX_ERR(0, 14, __pyx_L3_error) } } if (unlikely(kw_args > 0)) { if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "synthesis_cython4") < 0)) __PYX_ERR(0, 14, __pyx_L3_error) } } else if (PyTuple_GET_SIZE(__pyx_args) != 3) { goto __pyx_L5_argtuple_error; } else { values[0] = PyTuple_GET_ITEM(__pyx_args, 0); values[1] = PyTuple_GET_ITEM(__pyx_args, 1); values[2] = PyTuple_GET_ITEM(__pyx_args, 2); } __pyx_v_weights = ((PyArrayObject *)values[0]); __pyx_v_omega = __pyx_PyFloat_AsDouble(values[1]); if (unlikely((__pyx_v_omega == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 15, __pyx_L3_error) __pyx_v_time_vector = ((PyArrayObject *)values[2]); } goto __pyx_L4_argument_unpacking_done; __pyx_L5_argtuple_error:; __Pyx_RaiseArgtupleInvalid("synthesis_cython4", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 14, __pyx_L3_error) __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_e7cc766498993d3947c8c5bbc875e6a8.synthesis_cython4", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_weights), __pyx_ptype_5numpy_ndarray, 1, "weights", 0))) __PYX_ERR(0, 14, __pyx_L1_error) if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_time_vector), __pyx_ptype_5numpy_ndarray, 1, "time_vector", 0))) __PYX_ERR(0, 16, __pyx_L1_error) __pyx_r = __pyx_pf_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_synthesis_cython4(__pyx_self, __pyx_v_weights, __pyx_v_omega, __pyx_v_time_vector); /* function exit code */ goto __pyx_L0; __pyx_L1_error:; __pyx_r = NULL; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_synthesis_cython4(CYTHON_UNUSED PyObject *__pyx_self, PyArrayObject *__pyx_v_weights, double __pyx_v_omega, PyArrayObject *__pyx_v_time_vector) { int __pyx_v_i; int __pyx_v_j; PyArrayObject *__pyx_v_out = 0; __pyx_t_double_complex __pyx_v_temp_sum; __pyx_t_double_complex __pyx_v_temp_mult; __Pyx_LocalBuf_ND __pyx_pybuffernd_out; __Pyx_Buffer __pyx_pybuffer_out; __Pyx_LocalBuf_ND __pyx_pybuffernd_time_vector; __Pyx_Buffer __pyx_pybuffer_time_vector; __Pyx_LocalBuf_ND __pyx_pybuffernd_weights; __Pyx_Buffer __pyx_pybuffer_weights; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("synthesis_cython4", 0); __pyx_pybuffer_out.pybuffer.buf = NULL; __pyx_pybuffer_out.refcount = 0; __pyx_pybuffernd_out.data = NULL; __pyx_pybuffernd_out.rcbuffer = &__pyx_pybuffer_out; __pyx_pybuffer_weights.pybuffer.buf = NULL; __pyx_pybuffer_weights.refcount = 0; __pyx_pybuffernd_weights.data = NULL; __pyx_pybuffernd_weights.rcbuffer = &__pyx_pybuffer_weights; __pyx_pybuffer_time_vector.pybuffer.buf = NULL; __pyx_pybuffer_time_vector.refcount = 0; __pyx_pybuffernd_time_vector.data = NULL; __pyx_pybuffernd_time_vector.rcbuffer = &__pyx_pybuffer_time_vector; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_weights.rcbuffer->pybuffer, (PyObject*)__pyx_v_weights, &__Pyx_TypeInfo___pyx_t_double_complex, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 14, __pyx_L1_error) } __pyx_pybuffernd_weights.diminfo[0].strides = __pyx_pybuffernd_weights.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_weights.diminfo[0].shape = __pyx_pybuffernd_weights.rcbuffer->pybuffer.shape[0]; { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer, (PyObject*)__pyx_v_time_vector, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) __PYX_ERR(0, 14, __pyx_L1_error) } __pyx_pybuffernd_time_vector.diminfo[0].strides = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_time_vector.diminfo[0].shape = __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.shape[0]; /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_XDECREF(__pyx_t_4); __Pyx_XDECREF(__pyx_t_5); { PyObject *__pyx_type, *__pyx_value, *__pyx_tb; __Pyx_PyThreadState_declare __Pyx_PyThreadState_assign __Pyx_ErrFetch(&__pyx_type, &__pyx_value, &__pyx_tb); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __Pyx_ErrRestore(__pyx_type, __pyx_value, __pyx_tb);} __Pyx_AddTraceback("_cython_magic_e7cc766498993d3947c8c5bbc875e6a8.synthesis_cython4", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; goto __pyx_L2; __pyx_L0:; __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_out.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_time_vector.rcbuffer->pybuffer); __Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_weights.rcbuffer->pybuffer); __pyx_L2:; __Pyx_XDECREF((PyObject *)__pyx_v_out); __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple__10 = PyTuple_Pack(8, __pyx_n_s_weights, __pyx_n_s_omega, __pyx_n_s_time_vector, __pyx_n_s_i, __pyx_n_s_j, __pyx_n_s_out, __pyx_n_s_temp_sum, __pyx_n_s_temp_mult); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(0, 14, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple__10); __Pyx_GIVEREF(__pyx_tuple__10); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_1synthesis_cython4, NULL, __pyx_n_s_cython_magic_e7cc766498993d3947); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 14, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_synthesis_cython4, __pyx_t_1) < 0) __PYX_ERR(0, 14, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
15: double omega,
16: np.ndarray [double, ndim=1] time_vector):
17: cdef int i, j
+18: cdef np.ndarray [np.float64_t, ndim=1] out = np.empty_like(time_vector, dtype=np.float)
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty_like); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __Pyx_INCREF(((PyObject *)__pyx_v_time_vector)); __Pyx_GIVEREF(((PyObject *)__pyx_v_time_vector)); PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)__pyx_v_time_vector)); __pyx_t_3 = PyDict_New(); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_4); __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_float); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; if (PyDict_SetItem(__pyx_t_3, __pyx_n_s_dtype, __pyx_t_5) < 0) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_1, __pyx_t_3); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 18, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_5); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; if (!(likely(((__pyx_t_5) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_5, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 18, __pyx_L1_error) __pyx_t_6 = ((PyArrayObject *)__pyx_t_5); { __Pyx_BufFmt_StackElem __pyx_stack[1]; if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_out.rcbuffer->pybuffer, (PyObject*)__pyx_t_6, &__Pyx_TypeInfo_nn___pyx_t_5numpy_float64_t, PyBUF_FORMAT| PyBUF_STRIDES| PyBUF_WRITABLE, 1, 0, __pyx_stack) == -1)) { __pyx_v_out = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_out.rcbuffer->pybuffer.buf = NULL; __PYX_ERR(0, 18, __pyx_L1_error) } else {__pyx_pybuffernd_out.diminfo[0].strides = __pyx_pybuffernd_out.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_out.diminfo[0].shape = __pyx_pybuffernd_out.rcbuffer->pybuffer.shape[0]; } } __pyx_t_6 = 0; __pyx_v_out = ((PyArrayObject *)__pyx_t_5); __pyx_t_5 = 0;
19: cdef double complex temp_sum
20: cdef double complex temp_mult
+21: for i in prange(time_vector.shape[0], nogil=True):
{ #ifdef WITH_THREAD PyThreadState *_save; Py_UNBLOCK_THREADS #endif /*try:*/ { __pyx_t_7 = (__pyx_v_time_vector->dimensions[0]); if (1 == 0) abort(); { #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95))))) #undef likely #undef unlikely #define likely(x) (x) #define unlikely(x) (x) #endif __pyx_t_9 = (__pyx_t_7 - 0 + 1 - 1/abs(1)) / 1; if (__pyx_t_9 > 0) { #ifdef _OPENMP #pragma omp parallel #endif /* _OPENMP */ { #ifdef _OPENMP #pragma omp for firstprivate(__pyx_v_i) lastprivate(__pyx_v_i) lastprivate(__pyx_v_j) lastprivate(__pyx_v_temp_mult) lastprivate(__pyx_v_temp_sum) #endif /* _OPENMP */ for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_9; __pyx_t_8++){ { __pyx_v_i = (int)(0 + 1 * __pyx_t_8); /* Initialize private variables to invalid values */ __pyx_v_j = ((int)0xbad0bad0); /* … */ /*finally:*/ { /*normal exit:*/{ #ifdef WITH_THREAD Py_BLOCK_THREADS #endif goto __pyx_L5; } __pyx_L5:; } }
+22: temp_sum = 0
__pyx_v_temp_sum = __pyx_t_double_complex_from_parts(0, 0);
+23: temp_mult = exp(I * time_vector[i] * omega)
__pyx_t_10 = __pyx_v_i; __pyx_v_temp_mult = std::exp(__Pyx_c_prod_double(__Pyx_c_prod_double(__pyx_v_46_cython_magic_e7cc766498993d3947c8c5bbc875e6a8_I, __pyx_t_double_complex_from_parts((*__Pyx_BufPtrStrided1d(double *, __pyx_pybuffernd_time_vector.rcbuffer->pybuffer.buf, __pyx_t_10, __pyx_pybuffernd_time_vector.diminfo[0].strides)), 0)), __pyx_t_double_complex_from_parts(__pyx_v_omega, 0)));
+24: for j in range(weights.shape[0]):
__pyx_t_11 = (__pyx_v_weights->dimensions[0]); for (__pyx_t_12 = 0; __pyx_t_12 < __pyx_t_11; __pyx_t_12+=1) { __pyx_v_j = __pyx_t_12;
+25: temp_sum = temp_sum + weights[j] * temp_mult
__pyx_t_13 = __pyx_v_j; __pyx_v_temp_sum = __Pyx_c_sum_double(__pyx_v_temp_sum, __Pyx_c_prod_double((*__Pyx_BufPtrStrided1d(__pyx_t_double_complex *, __pyx_pybuffernd_weights.rcbuffer->pybuffer.buf, __pyx_t_13, __pyx_pybuffernd_weights.diminfo[0].strides)), __pyx_v_temp_mult)); }
+26: out[i] = real(temp_sum)
__pyx_t_14 = __pyx_v_i; *__Pyx_BufPtrStrided1d(__pyx_t_5numpy_float64_t *, __pyx_pybuffernd_out.rcbuffer->pybuffer.buf, __pyx_t_14, __pyx_pybuffernd_out.diminfo[0].strides) = std::real(__pyx_v_temp_sum); } } } } } #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95))))) #undef likely #undef unlikely #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) #endif }
+27: return out
__Pyx_XDECREF(__pyx_r); __Pyx_INCREF(((PyObject *)__pyx_v_out)); __pyx_r = ((PyObject *)__pyx_v_out); goto __pyx_L0;
Note that I had to give up my +=
term to compile this since I ran into this error message "Cannot read reduction variable in loop body" (fix found here).
Also, note that I supplied the compiler with the --/openmp flag, which is valid under Windows only (for Linux it's --fopenmp).
out = synthesis_cython4(weights, omega, time_vector)
np.allclose(ref_result, out)
True
%timeit synthesis(weights, omega, time_vector)
29.1 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit synthesis_cython4(weights, omega, time_vector)
9.23 ms ± 521 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
We achieve a nice little speedup to parallel execution on four cores.
Timings¶
Let's time the different versions of our functions!
timings = {}
for func, label in zip([synthesis, synthesis_cython1, synthesis_cython2, synthesis_cython3, synthesis_cython4],
['reference', 'cython1-naive', 'cython2-clever', 'cython3-clever-pure-c++-loop', 'cython-parallel']):
obj = %timeit -o func(weights, omega, time_vector)
timings[label] = obj
29.3 ms ± 639 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 123 ms ± 3.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 27.5 ms ± 352 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 27.3 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 8.76 ms ± 599 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
import pandas as pd
s = pd.Series({key: timings[key].average * 1e3 for key in timings}).to_frame(name='timings (ms)').sort_values(by='timings (ms)')
s
timings (ms) | |
---|---|
cython-parallel | 8.757900 |
cython3-clever-pure-c++-loop | 27.341302 |
cython2-clever | 27.519439 |
reference | 29.275485 |
cython1-naive | 123.381547 |
fig, ax = plt.subplots(figsize=(10, 5))
s.plot(kind='bar', ax=ax, rot=45)
plt.legend(loc='upper left')
<matplotlib.legend.Legend at 0xb901b38>
Conclusions¶
So here's the list of the things I learnt while doing this work:
- declaring complex variables within Cython functions
- using C++ std lib functions using cdef extern in Cython code
- Cython allows one to easily parallelize code
- the Cython error messages are often cryptic
- it is useful to read the generated Cython code to understand what's going on under the hood of the translation