Tag Archives: c++

Book review: Think Like a Programmer, by V. Anton Spraul

My publisher, No Starch Press, sent me a review copy of another of their books a few months ago. Regrettably I’ve been a bit slow in getting around to properly reading it, but here, finally, is my review of Think Like a Programmer, by V. Anton Spraul.

Front cover of Think Like a Programmer

Programming is as much an art as it is a science. When you’re starting out as a programmer, there’s a big mess of concepts and rules to get into your head before you can do anything much more complicated than printing out a shopping list on-screen – things like differences between different types of variable, and how pointers work. Even if you’re dealing with a language that hides all of these icky details, like Python, you’re still going to find yourself spending most of your time learning about specific structures like for loops or class declarations.

Most books for the newbie programmer focus on these mechanical details, generally with specific application to only one programming language. And quite rightly so, in my opinion; after all, it’s only by learning this stuff that you’re ever going to be able to do anything interesting. And so it’s only when you start programming regularly, or try to do anything more substantial, that the “artistic” side of programming starts to become really important.

With Think Like a Programmer, Spraul has his sights firmly set on the people who’ve already put the time in with the initial “science” bit. They have a decent amount of experience with one, maybe two, programming languages, and have progressed beyond textbook exercises or copy-pasting together code snippets to actually writing their own functioning code from scratch. But it’s at this point that a lot of people get stuck, with some of them never moving far beyond this predominantly mechanical understanding. Without a firm push in the right direction, habits can develop, experience reinforces them, and the art of programming never blooms.

It’s a philosophy, not a cookbook

Think Like a Programmer is a 233 page push in the right direction. It’s not a pattern book, with endless lists, block diagrams, and flowcharts for deciding when to use one tried-and-tested program structure over another. It’s not a cookbook, with myriad clever, practical examples to use as “inspiration”. Nor is it an “advanced programming” textbook, with detailed treatises on using the obscure, God-level features of whatever programming language you’re most concerned with. No. While it does have elements of all of these things here and there (they’re useful, after all), it’s much more concerned with your attitude to programming – how you approach solving problems with code. Your coding philosophy.

It would be easy to churn out a book on this subject that goes little farther than chastising you into adopting good coding style (how to indent blocks? where to add comments? how to name variables?), and pointing at a few useful patterns that relative newcomers often don’t know about. But Spraul has gone far beyond this. He starts off with a general discussion on how to solve problems, using examples of the type you might find somewhere near the back of a newspaper. The puzzles he walked through were really fun, and it felt good to go from blindly fiddling around with them to successfully applying the strategies he suggested. The lesson he’s teaching here is to sit down, think about the problem, and start using powerful general-purpose approaches like reducing it into smaller sub-problems, being systematic, and so forth. And, hey, whaddaya know? It works. Thinking works!

It’s a journey, not a destination

Things progress from there into the programming domain. In each successive chapter, Spraul builds on the discussion that has gone before, introducing new, generic, problem-solving approaches and combining them with the methods discussed previously to solve progressively harder example problems. The examples are followed through in excellent explanatory detail, with an emphasis on the structure and logic behind the code rather than the particular language features that are used (though all the examples are in C++, which has its fair share of relatively opaque syntax). He also strongly encourages that you try the numerous exercises at the end of each chapter to cement what you’ve learned, an approach that, while it may sound dry and textbooky, really does help you to get to grips with things.

There are chapters on general programming/problem-solving approaches, plus more specialised ones on using important tools like arrays, classes, pointers, and recursion to your advantage (the recursion one is available to download for free). The discussion is kept general – there’s little in here about specific optimisation or debugging tricks, and rather more of an emphasis on just writing good code, regardless of the specific application you might have in mind. As such, if all you’re after is quick tips for making your code run better, you’re not going to get that much out of the book. If you’re willing to sit down and patiently follow it through, however, you’ll find that what it’s teaching you is an enlightened approach to programming – essentially, how to become a Good programmer with a capital-G. And in the long run, that’s what’s going to make the difference between scraping by, writing cobbled-together solutions that just about work, and outputting truly nice, effective ones.

That troublesome audience

This is all well and good – noble, even – but I can’t help but wonder if the book will get through to its intended audience. If a novice programmer picks this up, there’s a good chance that they’ll struggle with the choice of language. While it’s a sensible pick for showing off the concepts the author is interested in, C++ isn’t the easiest language in the world, and Spraul isn’t afraid of using some of its more obscure syntax with only the briefest of explanations. For someone with experience only of Python, for example, the one-page overview of how pointers work in C++ isn’t going to leave them much wiser on the subject. There’s absolutely nothing to help the non-C++-using reader get set up with compilers and IDEs either, which could cause some serious headaches for those actually wanting to play with the examples themselves. As a result, those familiar with C++ will find the book a considerably easier ride than those who are not, which is a shame.

The purpose of the book is also a bit more subtle than your average (the blurb describes it as a “one-of-a-kind text”). You know what you’re getting with a cookbook, whereas the benefits brought by Think Like a Programmer are somewhat less tangible. The readers who’ll get the most out of this are the patient, motivated learners, whereas those who’re looking for shortcuts and quick fixes to “becoming a better programmer” will likely find it frustrating. Ultimately, I guess that’s fine – you can only help those who will be helped – but I guess this sort of presentation would be more effective to a broader range of people  in the context of a taught course rather than self-study.

Verdict

The book is well-written, with tons of excellent advice and solid, well-thought-out examples. If you’re willing to devote some time to studying the material (perhaps, depending on your background, with a C++ reference in hand), you’ll soon find yourself equipped with an impressive array of problem-solving strategies and, maybe, a new outlook on programming. Recommended.


Calling the WMAP likelihood code from C/C++

If you’re interested in cosmological parameters, chances are you’ll want to include WMAP CMB constraints in your parameter estimation code at some point. Happily, the WMAP team have made their likelihood code publicly-available, so it’s relatively simple to add them into your own MCMC software (or whatever else you’re using). Less happily, the likelihood code is written in Fortran 90, so users of more modern languages will need to do a bit of fiddling to get it to play ball with their code.

Joe Zuntz was kind enough to share a little C wrapper that he wrote for the WMAP likelihood code. It’s pretty easy to follow, although you should of course read the documentation for the likelihood code to see how to use it properly. First of all, compile the original Fortran WMAP likelihood code. Then, compile this wrapper function as a static library (libwmapwrapper) as follows:

gfortran -O2 -c WMAP_likelihood_wrapper.F90
ar rc libwmapwrapper.a WMAP_likelihood_wrapper.o

You can call the wrapper function from your C code by linking to that library. Joe has written a test implementation that shows how it works. To compile this, you’ll need to make sure you’re linking against everything the WMAP code needs, including a BLAS/LAPACK library; the following should work on Mac OS X (using veclib):

gcc wmap_test.c -std=c99 -L. -lwmap -lwmapwrapper -lcfitsio -framework veclib -lgfortran -o test_wmap

(N.B. Joe’s code is written for the WMAP 7-year release, so you may need to change a couple of numbers to get it working with the 9-year release.)


Update: Floating-point exception handling on Mac OS X

A few months ago, I was having a few problems porting some C++ code over to Mac OS X because of some non-standard floating-point exception handling functions that are present in glibc on Linux. Well, it just so happened that a colleague of mine, Rich Booth, recently ran into the same problem, only from a different angle. He wanted to keep track of floating point exceptions in a simulation code of his, but found that he couldn’t do this on his Mac.

After a bit of digging around, we found a portable implementation of floating point exception handling that would happily run on both Linux and Mac OS X. It was written by David N. Williams in 2009, and includes some good documentation on the implementation in the comments. There’s also an example program right at the end of the file, so you can test it out right away.

The code should be simple enough to figure out pretty quickly, but Rich split out a header file anyway, just to make everyone’s lives that bit easier. You can find a tarball with Rich’s modifications here.


Getting codes in different languages to interact

In a Code Coffee meeting last month (9th July), we discussed ways of getting codes in different languages to talk to one another. This is quite a common task; for example, if you need your modern data analysis pipeline to interact with some legacy code (normally Fortran!), or if you have a mostly Python code that has a computationally-intensive bit that would be faster written in C. Our focus was on Python, C, and Fortran, since they seem to be the most commonly used languages in the group.

People brought a number of different solutions with them. Some were very easy to implement, just as a simple few-line block in your Python code that could be used to speed up a loop. Others were quite a bit more complicated, requiring extra work on building modules and getting your hands dirty with memory management, but having the virtue of being much more flexible. The methods we discussed are summarised below, in increasing order of complexity. If you skip to the bottom, there’s also a link to a tarball, with some examples.

(Edit: Be sure to check out some of the interesting comments, below, too. They mostly concern Cython.)

Weave

Weave is a way of writing C code “inline” – directly into a Python script.  Python variables are automatically made available to the C code (you sometimes need to do a bit of extra work to cast them to a specific type, but not much), which makes it nice and easy to use, with little boilerplate required in the C code. Essentially, you can just stick a fast, few-line C loop in the middle of a Python function, then call Weave and it’ll handle compilation and shuffling data between Python and C for you automagically, without requiring any complicated wrapper code or makefile magic. There’s also a “blitz” mode, which takes a Python/NumPy expression and converts it into C++ in the background, which can also help speed things up (although it’s generally slower than the inline mode).

Unfortunately, the documentation isn’t great, but there are plenty of examples out there (SageMath have a couple of very clear ones). It should definitely be your first port of call if all you want to do is speed up part of your Python code. It’s not so good if you want to interface with legacy code, or have something more complicated that you need to do in C (e.g. something that’s split into different functions). See also Cython, which seems to be a bit more flexible, but not too much more difficult to use. There’s a nice speed comparison with NumPy, Weave blitz and inline, and MATLAB here, and a Cython/Weave example (with benchmarks) here. There’s also an interesting blog post on getting GSL to work with Weave/Cython.

f2py

f2py is a way of automatically generating Python modules from Fortran code. It seems pretty easy to use – all you need to do is run it over your Fortran source code, and it will produce a compiled module that you can simply import into your Python code and use as you would any other. The f2py program scans the Fortran code and figures out how to deal with function arguments and convert between types itself, which is very handy. It even generates function documentation (docstrings) for the Python module, telling you what format the function arguments need to be in. One problem with f2py, pointed out by Ryan, is that it tends to fail silently, making it difficult to debug! It’s apparently quite robust though, so this probably won’t be necessary that often. The documentation looks pretty good, and you can find other resources here. (N.B. They ask you to cite their paper if you use f2py in research code.)

Python ctypes

ctypes is a Python module for handling calls to shared libraries (whether they are written in C/C++ or Fortran). The syntax for loading libraries and calling functions is exceptionally simple – little different from importing a native Python module and calling functions in that. However, there’s some added complexity that comes as a result of having to convert between C and Python data types in function arguments. For the most part, all you need to do is call a simple ctypes helper function to make the conversion; e.g. c_double(42.3) will give you something that can be passed to a function that expects a C double as an argument. In more recent versions of NumPy, ndarrays also have ctypes conversion methods (see the ndarray.ctypes.data_as() method; you probably want to get a pointer to the array, for which you should use ctypes.POINTER), although there are also some subtleties to be aware of in terms of how Python/NumPy and C/C++ or Fortran store multidimensional arrays in memory (see the order property of NumPy arrays, for example). The Sage website has a good tutorial on working with ndarrays in ctypes. A slightly nicer way of handling types conversion is to write a small Python wrapper for the C function (or whatever), and use the argtypes attribute to specify what arguments the function accepts. A comprehensive tutorial can be found here, which explains most of the workings of ctypes, and which has good examples.

The tutorial on the Python website mostly concerns calling existing system libraries, but of course you can write your own libraries too. For C/C++ code compiled with GCC, this is pretty simple – all you need to do is add a couple of compiler flags (see Section 3.4 of the TDLP Library Program tutorial). Of course, your C code should be set out like a library first – just compiling something that’s meant to be a standalone program as a library won’t work too well, and you’ll need to write header files etc. That’s easy to do, and existing “non-library” code can often be refactored into a working library with very little effort. There’s a very nice, simple tutorial on writing and compiling libraries here. Note that the libraries don’t need any “special treatment” to interface with Python using ctypes – they’re just standard C/Fortran libraries, and can happily be called by other C/Fortran programs too.

Python C API

Big chunks of Python are written in C, so it should come as no surprise that there is a C library which provides access to Python data types and functions. Access it by including Python.h in your C code. I’ll keep discussion of this one to a deliberately high-level, because it’s significantly more complicated than the others, and doesn’t seem to be fantastically well documented (although see the official Python docs for it here and here). This method allows you to write new Python modules directly in C; the end result will be something that looks and behaves exactly like a standard Python module (of the sort that you may have written before, in Python), which requires no fiddling about with data types or what have you in the Python code that uses it. That’s because all of the fiddling around is done in the C code! I find this quite a bit trickier to write – your C files need some boilerplate to get themselves (and their constituent functions) recognised by the Python interpreter, and there’s a bit of makefile magic required to actually build and install the module too. Plus, you have to figure out how to handle type conversions between native C types and Python types (mostly PyObjects). It’s not super-difficult, but it is fiddly in places. Take a look at this nice tutorial by Dan Foreman-Mackey (and another one from JPL which looks more specifically at handling NumPy arrays).

Some of the more confusing issues that you’re likely to run into are to do with reference counting. This is a reasonably technical computer science concept, which has to do with how Python stores and labels variables. (Recall that, in Python, the same data can be referred to by multiple variable names.) It’s important to make sure your C code properly tracks references to Python objects as it manipulates them; otherwise, you’re likely to run into a host of problems, from memory leaks to weird crashes. If you’re going to use the Python C API, I strongly recommend that you invest a good amount of time in understanding how this works.

All in all, I think the C API is only the way forward if you’re specifically setting out to write a new Python module for use elsewhere; if all you have is some existing C code that you want to quickly plug in to some Python code, it’s going to be a lot of hassle. Still, once you’ve set up a project once, and used the C API a little bit, it’s a lot quicker to get up and running with the next project.

Summary

All in all, ctypes is probably the best route to go for most people, unless you have something particularly trivial you want to speed-up. It’s fast (native C function calls), easy enough to set up, and conceptually simple to understand. It’s not quite as “automatic” as Weave/Cython or f2py, but you’ll probably end up with cleaner, more flexible, and more robust code for your troubles. And it’s much easier to work with than the Python C API.

The Performance Python page has a nice comparison of some of the methods mentioned above. A tarball containing example code for most of the above methods can be downloaded from here; thanks to Ryan Houghton, Neale Gibson and Joe Zuntz for providing these.


Code Coffee: Parallelisation with OpenMP

Rich Booth gave us an introduction to using OpenMP to parallelise our code. It turns out to be surprisingly easy – all you need to do is add a specially-formed comment here and and there, and OpenMP will do the rest. At its most basic, OpenMP just takes a serial code (code meant to be run on a single processor), and splits the work between multiple threads, which can be run on multiple processors. This (hopefully) speeds up code execution by sharing the load. Rich gave examples for OpenMP in C and Fortran, but there are other parallelisation tools out there for other languages, like the multiprocessing module in Python. It was also mentioned that Matlab has excellent multi-processing capabilities, which tend to be quite easy to use.

Rich worked off this introduction to OpenMP, and showed us some basic examples of how for loops can be parallelised. He also discussed concepts like:

  • Scheduling: how the workload should be split up between different threads to make things as fast as possible (different scheduling strategies are available)
  • Scope: which variables should be private to a thread, which should be public and shared between all threads, and how to specify which is which
  • Preventing race conditions: ensuring that shared variables are updated in a coherent way by individual threads, e.g. by using atomic operations
  • Functions: Bundling code into a function which is then called from inside the parallelised block of code, to make it easier to keep track of private and shared variables

Other topics that were touched on include the difficulty of debugging parallelised code (the best way is to avoid having to debug by following a few good practises), the fact that OpenMP-enabled code can be run in “serial mode” by compiling with a standard compiler instead (the compiler will just throw up a few spurious warnings when it gets to the OpenMP bits), and that it’s best to only use OpenMP to parallelise a few well-defined tasks, like computationally-intensive for loops, rather than trying to parallelise everything (which can take a long time to code properly).

All in all, OpenMP looks like a nice, relatively stable way of speeding up operation that can be vectorised. It’s a lot simpler to implement than I thought, and doesn’t seem to require a load of code rewrites or anything as serious as that. There’s quite a bit of introductory tutorial material on the web, along with a few blogs dedicated to parallel programming. As usual, Wikipedia is helpful on concepts.


Code release: LTB in Python, spherical collapse, and Buchert averaging

The release of our next paper is imminent (yay!), and so it’s time for another code release. I try to make all of my code, or at least a substantial fraction of it, publicly available. This enables other people to reproduce and check my work if they want to. It also allows them to build off my code and do cool new things, rather than having to spend months solving problems that, well, have already been solved. That’s the theory, anyway – I only know of a couple of people who’ve actually poked around in the code, or tried to use it for something. But hey, you’ve got to start somewhere. For posterity, I’ve posted the closest thing I have to release notes below.

Continue reading