Getting codes in different languages to interact

In a Code Coffee meeting last month (9th July), we discussed ways of getting codes in different languages to talk to one another. This is quite a common task; for example, if you need your modern data analysis pipeline to interact with some legacy code (normally Fortran!), or if you have a mostly Python code that has a computationally-intensive bit that would be faster written in C. Our focus was on Python, C, and Fortran, since they seem to be the most commonly used languages in the group.

People brought a number of different solutions with them. Some were very easy to implement, just as a simple few-line block in your Python code that could be used to speed up a loop. Others were quite a bit more complicated, requiring extra work on building modules and getting your hands dirty with memory management, but having the virtue of being much more flexible. The methods we discussed are summarised below, in increasing order of complexity. If you skip to the bottom, there’s also a link to a tarball, with some examples.

(Edit: Be sure to check out some of the interesting comments, below, too. They mostly concern Cython.)

Weave

Weave is a way of writing C code “inline” – directly into a Python script.  Python variables are automatically made available to the C code (you sometimes need to do a bit of extra work to cast them to a specific type, but not much), which makes it nice and easy to use, with little boilerplate required in the C code. Essentially, you can just stick a fast, few-line C loop in the middle of a Python function, then call Weave and it’ll handle compilation and shuffling data between Python and C for you automagically, without requiring any complicated wrapper code or makefile magic. There’s also a “blitz” mode, which takes a Python/NumPy expression and converts it into C++ in the background, which can also help speed things up (although it’s generally slower than the inline mode).

Unfortunately, the documentation isn’t great, but there are plenty of examples out there (SageMath have a couple of very clear ones). It should definitely be your first port of call if all you want to do is speed up part of your Python code. It’s not so good if you want to interface with legacy code, or have something more complicated that you need to do in C (e.g. something that’s split into different functions). See also Cython, which seems to be a bit more flexible, but not too much more difficult to use. There’s a nice speed comparison with NumPy, Weave blitz and inline, and MATLAB here, and a Cython/Weave example (with benchmarks) here. There’s also an interesting blog post on getting GSL to work with Weave/Cython.

f2py

f2py is a way of automatically generating Python modules from Fortran code. It seems pretty easy to use – all you need to do is run it over your Fortran source code, and it will produce a compiled module that you can simply import into your Python code and use as you would any other. The f2py program scans the Fortran code and figures out how to deal with function arguments and convert between types itself, which is very handy. It even generates function documentation (docstrings) for the Python module, telling you what format the function arguments need to be in. One problem with f2py, pointed out by Ryan, is that it tends to fail silently, making it difficult to debug! It’s apparently quite robust though, so this probably won’t be necessary that often. The documentation looks pretty good, and you can find other resources here. (N.B. They ask you to cite their paper if you use f2py in research code.)

Python ctypes

ctypes is a Python module for handling calls to shared libraries (whether they are written in C/C++ or Fortran). The syntax for loading libraries and calling functions is exceptionally simple – little different from importing a native Python module and calling functions in that. However, there’s some added complexity that comes as a result of having to convert between C and Python data types in function arguments. For the most part, all you need to do is call a simple ctypes helper function to make the conversion; e.g. c_double(42.3) will give you something that can be passed to a function that expects a C double as an argument. In more recent versions of NumPy, ndarrays also have ctypes conversion methods (see the ndarray.ctypes.data_as() method; you probably want to get a pointer to the array, for which you should use ctypes.POINTER), although there are also some subtleties to be aware of in terms of how Python/NumPy and C/C++ or Fortran store multidimensional arrays in memory (see the order property of NumPy arrays, for example). The Sage website has a good tutorial on working with ndarrays in ctypes. A slightly nicer way of handling types conversion is to write a small Python wrapper for the C function (or whatever), and use the argtypes attribute to specify what arguments the function accepts. A comprehensive tutorial can be found here, which explains most of the workings of ctypes, and which has good examples.

The tutorial on the Python website mostly concerns calling existing system libraries, but of course you can write your own libraries too. For C/C++ code compiled with GCC, this is pretty simple – all you need to do is add a couple of compiler flags (see Section 3.4 of the TDLP Library Program tutorial). Of course, your C code should be set out like a library first – just compiling something that’s meant to be a standalone program as a library won’t work too well, and you’ll need to write header files etc. That’s easy to do, and existing “non-library” code can often be refactored into a working library with very little effort. There’s a very nice, simple tutorial on writing and compiling libraries here. Note that the libraries don’t need any “special treatment” to interface with Python using ctypes – they’re just standard C/Fortran libraries, and can happily be called by other C/Fortran programs too.

Python C API

Big chunks of Python are written in C, so it should come as no surprise that there is a C library which provides access to Python data types and functions. Access it by including Python.h in your C code. I’ll keep discussion of this one to a deliberately high-level, because it’s significantly more complicated than the others, and doesn’t seem to be fantastically well documented (although see the official Python docs for it here and here). This method allows you to write new Python modules directly in C; the end result will be something that looks and behaves exactly like a standard Python module (of the sort that you may have written before, in Python), which requires no fiddling about with data types or what have you in the Python code that uses it. That’s because all of the fiddling around is done in the C code! I find this quite a bit trickier to write – your C files need some boilerplate to get themselves (and their constituent functions) recognised by the Python interpreter, and there’s a bit of makefile magic required to actually build and install the module too. Plus, you have to figure out how to handle type conversions between native C types and Python types (mostly PyObjects). It’s not super-difficult, but it is fiddly in places. Take a look at this nice tutorial by Dan Foreman-Mackey (and another one from JPL which looks more specifically at handling NumPy arrays).

Some of the more confusing issues that you’re likely to run into are to do with reference counting. This is a reasonably technical computer science concept, which has to do with how Python stores and labels variables. (Recall that, in Python, the same data can be referred to by multiple variable names.) It’s important to make sure your C code properly tracks references to Python objects as it manipulates them; otherwise, you’re likely to run into a host of problems, from memory leaks to weird crashes. If you’re going to use the Python C API, I strongly recommend that you invest a good amount of time in understanding how this works.

All in all, I think the C API is only the way forward if you’re specifically setting out to write a new Python module for use elsewhere; if all you have is some existing C code that you want to quickly plug in to some Python code, it’s going to be a lot of hassle. Still, once you’ve set up a project once, and used the C API a little bit, it’s a lot quicker to get up and running with the next project.

Summary

All in all, ctypes is probably the best route to go for most people, unless you have something particularly trivial you want to speed-up. It’s fast (native C function calls), easy enough to set up, and conceptually simple to understand. It’s not quite as “automatic” as Weave/Cython or f2py, but you’ll probably end up with cleaner, more flexible, and more robust code for your troubles. And it’s much easier to work with than the Python C API.

The Performance Python page has a nice comparison of some of the methods mentioned above. A tarball containing example code for most of the above methods can be downloaded from here; thanks to Ryan Houghton, Neale Gibson and Joe Zuntz for providing these.

About Phil Bull

I'm a theoretical cosmologist, currently working as a NASA NPP fellow at JPL/Caltech in Pasadena, CA. My research focuses on the effects of inhomogeneities on the evolution of the Universe and how we measure it. I'm also keen on stochastic processes, scientific computing, the philosophy of science, and open source stuff. View all posts by Phil Bull

9 responses to “Getting codes in different languages to interact

  • Gael Varoquaux

    Weave has been obsolete for many years and should not be used.

    I strongly suggest using Cython to do C binding.
    http://gael-varoquaux.info/blog/?p=157

    • Jean-Louis Durrieu

      Hi there!

      Gael, you indeed told me so some time ago. However, I am not sure, as an after-thought, that I would say that Cython does “C binding”, well not in the way that the user explicitly binds some C code to her Python code. As I understand it, it rather consists in adding some keywords in the python code so that the Cython program can create a C version of the code. Anyway, that’s probably only a terminology issue. Note however that Cython does not allow the Python user to “interact with different languages”: its ease of use comes from the fact that you sort of write something like Python, declaring a bit more than with Python to actually get the desired speed up effect. Well, for most cases, I guess Cython is a good way, except that a mistake (forgetting the declare some variable) seems to result in a substantial drop of performance. That’s why I would prefer to stick with a solution where I can control (most of) the C code that is produced. At least, if it does not work, I do not have to go through an “obscure” wrapper layer to debug.

      For C/C++, I therefore tried SWIG (http://www.swig.org/), and that is probably more what is comparable to Weave: you have some C/C++ code, and you want to be able to call it from Python. It’s very close to Weave, to this aspect, but in practice, I have found so far that one somewhat better controls the process with SWIG: Weave does not exactly tell you where or when it produces the callable compiled library, and it is not easy to know when the library is compiled again (after editing the C code? after editing the support code?? of course, when I force compiling, but do I really want to compile my library each of the 5000 times I call the routine???). With SWIG, since one has to compile it “by hand” (with setuptools, for example), one better knows what is done. The produced library can be called just like a module, such that it’s very convenient.

      As a matter of fact, some time ago, I also started a post on the topic, but never came up with a very structured article. I reported a few thoughts, and started a table in which I report all the errors I encounter, and the solutions I found (if ever…). The link is there: http://durrieu.ch/wordpress/?p=176. I guess the most important thing I had to work out was to declare the right ndarray input/output to the C code in the SWIG interface file. For SWIG version 1.3.40, I had to use an interface file (numpy.i) from an old NumPy repository (seems to have disappeared in more recent versions). Not sure whether that means that SWIG now supports NumPy by default or not… Anyway, that worked for me. Now, of course, I would be interested to know the reasons why one should or should not use SWIG, but for now, I guess I am going to stick with it! (except if someone gives a _very_ good reason not to!) Admittedly, with SWIG, I do not control all the code that the wrapper generates, but at least, the original C code that you feed SWIG with is what matters (hopefully, and if you trust the wrapper devs :D) and is plain C code, so usable anywhere else “as is”.

      Additionally, I’d like to point to an “issue” that I had with weave.inline (and probably with SWIG or even Cython, for that matter): beware of the use of 2d-arrays, and the transpose operator. In numpy, transpose does not create a new array or rearrange the data in memory, it simply tells that the data should be read in another order (e.g., if originally C-CONTIGUOUS, then the transposed array becomes F-CONTIGUOUS). The program will still run, no bug, no error, but the results are going to be completely wrong – well, except if you’re lucky, let’s say the results won’t be what you were expecting!

      At last… sorry for the very long reply! Have a nice week!

      • Jorge

        “However, I am not sure […] that I would say that Cython does “C binding”, well not in the way that the user explicitly binds some C code to her Python code”

        Cython does allow you to do “C binding” (in a very easy way). I know it because I just did that.

        It’s as simple as doing

        cdef extern from “my_own_c_code.c”:
        void spam(int parrot, double * eggs)

        and then using it, after calling np.ascontiguousarrays to any array you’re about to hand in to the C code (to avoid the last issue you mention).

        That is, I join Gael in his strong suggestion.

        Regards

      • Jean-Louis Durrieu

        My mistake! I must confess I did not go through Gael’s post, but now, that sounds interesting.

        Is there however any reason to go for Cython rather than SWIG? Maybe the way they interact with NumPy arrays? I only tried SWIG with numpy.i, and some tutorial here and there. It worked, so that was fine!

        As for the contiguity, that’s good to know, thanks!

      • Jorge

        I haven’t used SWIG so I can’t really comment on that. But IIRC petsc4py used to be wrapped using SWIG and then moved to Cython.

        Thank you!

  • Michael Meinel

    A nice overview on how to mix C and Python. However, I cannot fully agree with the conclusion. Even though ctypes is a good choice for easy incorporation of C libraries it is actually one of the least performing options. Most of the type conversion is done in Python code which leads to an important hit when invoking functions.
    I also think there are some more techniques missing: I would at least state interface generators like SWIG or SIP. They should be en par with ctypes when it gets to complexity. However, as they generate native Python modules they have remarkably less function calling overhead.
    After all, I want to support Gael as Cython is by far my favourite option for getting the most out of both worlds.

  • philliphelbig

    Legacy = stuff that works. 🙂

  • Dag Sverre Seljebotn

    Since you mention Fortran; Kurt Smith (and me a bit) also worked on Fwrap, which aimed to get smooth Fortran 90+ integration (since f2py is only really smooth for F77 code). Unfortunately there’s about another 2-4 weeks of work to make it perfect, and that’s the way it’s been for a year or so.

    As for ctypes vs. Cython, I’m too emotionally attached to Cython to comment perhaps, but ctypes should definitely be a lot slower (since speed was mentioned — of course, usually you’re passing an array and it doesn’t matter one bit — the times it does matters is when you make a “primitive” that you use a lot in Python code, e.g., Sage wrapping the multi-precision integers of GMP.)

  • Dag Sverre Seljebotn

    As for ctypes vs. Cython vs. SWIG:

    ctypes you don’t need to bother with compilation, while with Cython and SWIG you do (especially Cython + Fortran + NumPy C-API can be a pain to build without the right tools and know-how).

    The flip side of that is that if you get a definition wrong in ctypes you get a crash rather than a compiler error, and any portability across platforms must be hand-coded by you.

    SWIG is the only tool that will automatically wrap C++ code without having to repeat the definitions, which can be a big deal in some cases.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: