I’m still at the Numerical Cosmology 2012 workshop in Cambridge (the last day is tomorrow). I’m pretty sleepy by now (9am start for talks makes Phil sad), but I figured I’d squeeze in a few stories before I retire to my bed.
Yesterday was one of those weird days that I seem have every so often which confirms to me that physics makes for an interesting life. After a full-ish programme of talks, we all watched Stephen Hawking unveil his new supercomputer, via video link, to an extremely chilly room just around the corner that we subsequently all piled into (along with grinning computer industry representatives and a TV crew) to prod the shiny new computer and gurn at the camera. Supercomputers are much smaller than they used to be. Later on, we had the conference dinner in Trinity Hall, where I sat opposite John Reid, one of the people involved in setting Fortran standards. Being the Python aficionado that I am (and also, having discovered earlier in the evening that Fortran 77 uses magic numbers for certain I/O tasks*), I solicited his thoughts on designing programming languages to encourage good programming style. Didn’t really get much out of him on that, but he did give an interesting talk on coarrays earlier in the day (coarrays are an extension to Fortran that are intended to make parallel computing a bit more transparent, and thus easier). Ho hum. I topped the evening off by introducing myself to Dick Bond as an emeritus professor. He didn’t seem convinced.
I also learned very, very much about High Performance Computing (HPC). By that, I don’t mean running your average MCMC code on the cluster in the basement – oh no, proper tens-of-thousands-of-cores stuff. It turns out that programming models and tools are beginning to look a little outdated in the face of modern HPC systems – technologies like OpenMP were apparently designed with a “many single-core nodes” architecture in mind, whereas we now have systems consisting of many nodes, each of which has many cores. Of course, it’s faster to transfer things between cores on a given node than it is to pass data between nodes, but OpenMP doesn’t make it easy to differentiate between the two situations. David Henty (Edinburgh) talked about the consequences of this for HPC – is guiding principle was to “keep data on the same device for as long as possible”. Transferring data and other communication between nodes is what kills performance on these massively parallel systems, since communication buses are intrinsically slower than working on-chip. Oh, and cache misses – if your parallel code isn’t scaling too well, or otherwise seems sluggish, the number one suspect has to be cache misses.
Another little performance-related tip comes courtesy of Hal Finkel (Argonne), who’s attempting trillion particle simulations (about 100x larger than the current state of the art) by using a whole host of clever computational tricks and optimisations. He mentioned that some architectures have specific instructions for providing rapid estimates of certain common floating point operations (e.g. square roots). If you can afford the reduced accuracy, using these prevents your code from having to branch and call sqrt() or whatever from a library. If this is happening in an inner loop – kerching! Easy instant speed-up.
And one final little note: Supercomputers break all of the time! The Mean Time Between Failures (MTBF) of a big computer system is typically a few days, so HPC codes have to store snapshots of their state (checkpoints) every few hours, and must support resuming from these checkpoints in order to be able to do anything useful.
* Why? Aaaaagh! Aaagh! (OK, it probably made slightly more sense at the time.)