Tag Archives: fortran

Calling the WMAP likelihood code from C/C++

If you’re interested in cosmological parameters, chances are you’ll want to include WMAP CMB constraints in your parameter estimation code at some point. Happily, the WMAP team have made their likelihood code publicly-available, so it’s relatively simple to add them into your own MCMC software (or whatever else you’re using). Less happily, the likelihood code is written in Fortran 90, so users of more modern languages will need to do a bit of fiddling to get it to play ball with their code.

Joe Zuntz was kind enough to share a little C wrapper that he wrote for the WMAP likelihood code. It’s pretty easy to follow, although you should of course read the documentation for the likelihood code to see how to use it properly. First of all, compile the original Fortran WMAP likelihood code. Then, compile this wrapper function as a static library (libwmapwrapper) as follows:

gfortran -O2 -c WMAP_likelihood_wrapper.F90
ar rc libwmapwrapper.a WMAP_likelihood_wrapper.o

You can call the wrapper function from your C code by linking to that library. Joe has written a test implementation that shows how it works. To compile this, you’ll need to make sure you’re linking against everything the WMAP code needs, including a BLAS/LAPACK library; the following should work on Mac OS X (using veclib):

gcc wmap_test.c -std=c99 -L. -lwmap -lwmapwrapper -lcfitsio -framework veclib -lgfortran -o test_wmap

(N.B. Joe’s code is written for the WMAP 7-year release, so you may need to change a couple of numbers to get it working with the 9-year release.)


Handling Fortran binary files with Python

Today’s top tip, straight from the front lines of, err, cosmology, concerns the handling of Fortran binary files with Python. If you have to deal with legacy Fortran code, you might find that it has a penchant for outputting files in an “unformatted” binary format. Exactly how the data is stored in this format can apparently depend on a lot of factors, such as the endian-ness of your machine. Hurrah! The files are non-portable. The necessity to figure what exactly is being dumped into the file can also cause headaches if you’re trying to read Fortran output into a program written in another language (for example, C/C++).

And it gets worse – what if you’re trying to write a file to be read in by the Fortran code, using a different language? Well, this particular problem cropped up for me, and of course I wanted to use Python to solve it.

For Python, fortunately, Neil Martinsen-Burrell has written FortranFile, which is subclassed off the standard Python file object, and which provides some convenient functions for reading and writing Fortran-formatted files. (There’s also an entry in the SciPy Cookbook that is relevant for Fortran I/O.) The pertinent function for me was writeReals(), which writes a NumPy array to a file in the appropriate format. The only subtlety I came across while briefly fiddling with this was that, when constructing a new FortranFile object for writing, you have to pass the mode keyword (e.g. mode="w"), which is inherited from file. (There’s a thread with a couple of tips on using FortranFile here.)

There are probably better ways of doing this than downloading someone else’s code off the internet, but like I said, I didn’t want to spend all afternoon on it.


Weird gfortran compiler error of the week

In response to the following compiler error from gfortran, the correct response is apparently to run make clean and then make again:

Error: More actual than formal arguments in procedure call at (1)

God knows what was actually going wrong.


Getting codes in different languages to interact

In a Code Coffee meeting last month (9th July), we discussed ways of getting codes in different languages to talk to one another. This is quite a common task; for example, if you need your modern data analysis pipeline to interact with some legacy code (normally Fortran!), or if you have a mostly Python code that has a computationally-intensive bit that would be faster written in C. Our focus was on Python, C, and Fortran, since they seem to be the most commonly used languages in the group.

People brought a number of different solutions with them. Some were very easy to implement, just as a simple few-line block in your Python code that could be used to speed up a loop. Others were quite a bit more complicated, requiring extra work on building modules and getting your hands dirty with memory management, but having the virtue of being much more flexible. The methods we discussed are summarised below, in increasing order of complexity. If you skip to the bottom, there’s also a link to a tarball, with some examples.

(Edit: Be sure to check out some of the interesting comments, below, too. They mostly concern Cython.)

Weave

Weave is a way of writing C code “inline” – directly into a Python script.  Python variables are automatically made available to the C code (you sometimes need to do a bit of extra work to cast them to a specific type, but not much), which makes it nice and easy to use, with little boilerplate required in the C code. Essentially, you can just stick a fast, few-line C loop in the middle of a Python function, then call Weave and it’ll handle compilation and shuffling data between Python and C for you automagically, without requiring any complicated wrapper code or makefile magic. There’s also a “blitz” mode, which takes a Python/NumPy expression and converts it into C++ in the background, which can also help speed things up (although it’s generally slower than the inline mode).

Unfortunately, the documentation isn’t great, but there are plenty of examples out there (SageMath have a couple of very clear ones). It should definitely be your first port of call if all you want to do is speed up part of your Python code. It’s not so good if you want to interface with legacy code, or have something more complicated that you need to do in C (e.g. something that’s split into different functions). See also Cython, which seems to be a bit more flexible, but not too much more difficult to use. There’s a nice speed comparison with NumPy, Weave blitz and inline, and MATLAB here, and a Cython/Weave example (with benchmarks) here. There’s also an interesting blog post on getting GSL to work with Weave/Cython.

f2py

f2py is a way of automatically generating Python modules from Fortran code. It seems pretty easy to use – all you need to do is run it over your Fortran source code, and it will produce a compiled module that you can simply import into your Python code and use as you would any other. The f2py program scans the Fortran code and figures out how to deal with function arguments and convert between types itself, which is very handy. It even generates function documentation (docstrings) for the Python module, telling you what format the function arguments need to be in. One problem with f2py, pointed out by Ryan, is that it tends to fail silently, making it difficult to debug! It’s apparently quite robust though, so this probably won’t be necessary that often. The documentation looks pretty good, and you can find other resources here. (N.B. They ask you to cite their paper if you use f2py in research code.)

Python ctypes

ctypes is a Python module for handling calls to shared libraries (whether they are written in C/C++ or Fortran). The syntax for loading libraries and calling functions is exceptionally simple – little different from importing a native Python module and calling functions in that. However, there’s some added complexity that comes as a result of having to convert between C and Python data types in function arguments. For the most part, all you need to do is call a simple ctypes helper function to make the conversion; e.g. c_double(42.3) will give you something that can be passed to a function that expects a C double as an argument. In more recent versions of NumPy, ndarrays also have ctypes conversion methods (see the ndarray.ctypes.data_as() method; you probably want to get a pointer to the array, for which you should use ctypes.POINTER), although there are also some subtleties to be aware of in terms of how Python/NumPy and C/C++ or Fortran store multidimensional arrays in memory (see the order property of NumPy arrays, for example). The Sage website has a good tutorial on working with ndarrays in ctypes. A slightly nicer way of handling types conversion is to write a small Python wrapper for the C function (or whatever), and use the argtypes attribute to specify what arguments the function accepts. A comprehensive tutorial can be found here, which explains most of the workings of ctypes, and which has good examples.

The tutorial on the Python website mostly concerns calling existing system libraries, but of course you can write your own libraries too. For C/C++ code compiled with GCC, this is pretty simple – all you need to do is add a couple of compiler flags (see Section 3.4 of the TDLP Library Program tutorial). Of course, your C code should be set out like a library first – just compiling something that’s meant to be a standalone program as a library won’t work too well, and you’ll need to write header files etc. That’s easy to do, and existing “non-library” code can often be refactored into a working library with very little effort. There’s a very nice, simple tutorial on writing and compiling libraries here. Note that the libraries don’t need any “special treatment” to interface with Python using ctypes – they’re just standard C/Fortran libraries, and can happily be called by other C/Fortran programs too.

Python C API

Big chunks of Python are written in C, so it should come as no surprise that there is a C library which provides access to Python data types and functions. Access it by including Python.h in your C code. I’ll keep discussion of this one to a deliberately high-level, because it’s significantly more complicated than the others, and doesn’t seem to be fantastically well documented (although see the official Python docs for it here and here). This method allows you to write new Python modules directly in C; the end result will be something that looks and behaves exactly like a standard Python module (of the sort that you may have written before, in Python), which requires no fiddling about with data types or what have you in the Python code that uses it. That’s because all of the fiddling around is done in the C code! I find this quite a bit trickier to write – your C files need some boilerplate to get themselves (and their constituent functions) recognised by the Python interpreter, and there’s a bit of makefile magic required to actually build and install the module too. Plus, you have to figure out how to handle type conversions between native C types and Python types (mostly PyObjects). It’s not super-difficult, but it is fiddly in places. Take a look at this nice tutorial by Dan Foreman-Mackey (and another one from JPL which looks more specifically at handling NumPy arrays).

Some of the more confusing issues that you’re likely to run into are to do with reference counting. This is a reasonably technical computer science concept, which has to do with how Python stores and labels variables. (Recall that, in Python, the same data can be referred to by multiple variable names.) It’s important to make sure your C code properly tracks references to Python objects as it manipulates them; otherwise, you’re likely to run into a host of problems, from memory leaks to weird crashes. If you’re going to use the Python C API, I strongly recommend that you invest a good amount of time in understanding how this works.

All in all, I think the C API is only the way forward if you’re specifically setting out to write a new Python module for use elsewhere; if all you have is some existing C code that you want to quickly plug in to some Python code, it’s going to be a lot of hassle. Still, once you’ve set up a project once, and used the C API a little bit, it’s a lot quicker to get up and running with the next project.

Summary

All in all, ctypes is probably the best route to go for most people, unless you have something particularly trivial you want to speed-up. It’s fast (native C function calls), easy enough to set up, and conceptually simple to understand. It’s not quite as “automatic” as Weave/Cython or f2py, but you’ll probably end up with cleaner, more flexible, and more robust code for your troubles. And it’s much easier to work with than the Python C API.

The Performance Python page has a nice comparison of some of the methods mentioned above. A tarball containing example code for most of the above methods can be downloaded from here; thanks to Ryan Houghton, Neale Gibson and Joe Zuntz for providing these.


Numerical Cosmology titbits

I’m still at the Numerical Cosmology 2012 workshop in Cambridge (the last day is tomorrow). I’m pretty sleepy by now (9am start for talks makes Phil sad), but I figured I’d squeeze in a few stories before I retire to my bed.

Yesterday was one of those weird days that I seem have every so often which confirms to me that physics makes for an interesting life. After a full-ish programme of talks, we all watched Stephen Hawking unveil his new supercomputer, via video link, to an extremely chilly room just around the corner that we subsequently all piled into (along with grinning computer industry representatives and a TV crew) to prod the shiny new computer and gurn at the camera. Supercomputers are much smaller than they used to be. Later on, we had the conference dinner in Trinity Hall, where I sat opposite John Reid, one of the people involved in setting Fortran standards. Being the Python aficionado that I am (and also, having discovered earlier in the evening that Fortran 77 uses magic numbers for certain I/O tasks*), I solicited his thoughts on designing programming languages to encourage good programming style. Didn’t really get much out of him on that, but he did give an interesting talk on coarrays earlier in the day (coarrays are an extension to Fortran that are intended to make parallel computing a bit more transparent, and thus easier). Ho hum. I topped the evening off by introducing myself to Dick Bond as an emeritus professor. He didn’t seem convinced.

I also learned very, very much about High Performance Computing (HPC). By that, I don’t mean running your average MCMC code on the cluster in the basement – oh no, proper tens-of-thousands-of-cores stuff. It turns out that programming models and tools are beginning to look a little outdated in the face of modern HPC systems – technologies like OpenMP were apparently designed with a “many single-core nodes” architecture in mind, whereas we now have systems consisting of many nodes, each of which has many cores. Of course, it’s faster to transfer things between cores on a given node than it is to pass data between nodes, but OpenMP doesn’t make it easy to differentiate between the two situations. David Henty (Edinburgh) talked about the consequences of this for HPC – is guiding principle was to “keep data on the same device for as long as possible”. Transferring data and other communication between nodes is what kills performance on these massively parallel systems, since communication buses are intrinsically slower than working on-chip. Oh, and cache misses – if your parallel code isn’t scaling too well, or otherwise seems sluggish, the number one suspect has to be cache misses.

Another little performance-related tip comes courtesy of Hal Finkel (Argonne), who’s attempting trillion particle simulations (about 100x larger than the current state of the art) by using a whole host of clever computational tricks and optimisations. He mentioned that some architectures have specific instructions for providing rapid estimates of certain common floating point operations (e.g. square roots). If you can afford the reduced accuracy, using these prevents your code from having to branch and call sqrt() or whatever from a library. If this is happening in an inner loop – kerching! Easy instant speed-up.

And one final little note: Supercomputers break all of the time! The Mean Time Between Failures (MTBF) of a big computer system is typically a few days, so HPC codes have to store snapshots of their state (checkpoints) every few hours, and must support resuming from these checkpoints in order to be able to do anything useful.

* Why? Aaaaagh! Aaagh! (OK, it probably made slightly more sense at the time.)


Code Coffee: Parallelisation with OpenMP

Rich Booth gave us an introduction to using OpenMP to parallelise our code. It turns out to be surprisingly easy – all you need to do is add a specially-formed comment here and and there, and OpenMP will do the rest. At its most basic, OpenMP just takes a serial code (code meant to be run on a single processor), and splits the work between multiple threads, which can be run on multiple processors. This (hopefully) speeds up code execution by sharing the load. Rich gave examples for OpenMP in C and Fortran, but there are other parallelisation tools out there for other languages, like the multiprocessing module in Python. It was also mentioned that Matlab has excellent multi-processing capabilities, which tend to be quite easy to use.

Rich worked off this introduction to OpenMP, and showed us some basic examples of how for loops can be parallelised. He also discussed concepts like:

  • Scheduling: how the workload should be split up between different threads to make things as fast as possible (different scheduling strategies are available)
  • Scope: which variables should be private to a thread, which should be public and shared between all threads, and how to specify which is which
  • Preventing race conditions: ensuring that shared variables are updated in a coherent way by individual threads, e.g. by using atomic operations
  • Functions: Bundling code into a function which is then called from inside the parallelised block of code, to make it easier to keep track of private and shared variables

Other topics that were touched on include the difficulty of debugging parallelised code (the best way is to avoid having to debug by following a few good practises), the fact that OpenMP-enabled code can be run in “serial mode” by compiling with a standard compiler instead (the compiler will just throw up a few spurious warnings when it gets to the OpenMP bits), and that it’s best to only use OpenMP to parallelise a few well-defined tasks, like computationally-intensive for loops, rather than trying to parallelise everything (which can take a long time to code properly).

All in all, OpenMP looks like a nice, relatively stable way of speeding up operation that can be vectorised. It’s a lot simpler to implement than I thought, and doesn’t seem to require a load of code rewrites or anything as serious as that. There’s quite a bit of introductory tutorial material on the web, along with a few blogs dedicated to parallel programming. As usual, Wikipedia is helpful on concepts.