Monthly Archives: July 2012

Book review: The Linux Command Line, by William E. Shotts

No Starch Press kindly sent me a copy of The Linux Command Line to review a little while back. I already posted a review on Amazon, but figured that the book might be of interest to people here, too. Read on for a fuller, more detailed review.

The Linux Command Line book cover

Although point-and-click graphical interfaces may initially seem more intuitive, many tasks are much better suited to command line applications. I first started using the command line mostly out of necessity, back in the days when desktop Linux was still rather unfriendly to new users. After a little while, I had gained enough familiarity with the system to start cooking up useful combinations of commands for myself, without having to constantly copy them from books or webpages. I soon found myself preferring to use the command line for many tasks – it became quicker and easier than using a GUI. I even gained a rudimentary understanding of how the command line itself works, and started using a few of its more advanced features. And that’s the stage I’m at now – I know how to use a range of useful commands, and can achieve a lot using the command line in an efficient and satisfying manner. But anything beyond the basics of powerful concepts like regular expressions, commands like sed and find, and the ins and outs of Bash scripting, are still a bit of a mystery to me. If I need to do anything fancy, I fall back on doing it manually, or else write a Python script, simply because I’m more comfortable doing things that way. “The Linux Command Line” is a very useful book, in that it has helped me to take my command line skills to the next level, making it much less of a necessity to develop my own tools.

Something for every eventuality

As with most books from No Starch*, the book is beautifully presented and well written. All of the common command-line tasks are covered, many of them in detail, and there are a number of nice, friendly introductory chapters (pdf) for those who have never used a command line before. But the real value, for me at least, came from the treasure trove of more obscure commands that the book contains. UNIX is a very mature system, and practically every computer-based task you can think of can be performed using existing command line tools. Indeed, on first opening the book to a random page, I immediately found a tool to automatically format text files in a rather specific way, a task that I had been wasting quite a bit of time doing manually (and hadn’t yet encouraged myself to automate). Even old hands will be able to find something new in here, such is the diversity of the topics covered.

Other highlights include a guide to programming in Bash, which cuts through the quirks of the scripting language in a rather transparent manner, and a number of chapters on the common system admin tasks that occupy most peoples’ command line time. The latter part is slightly hampered by being rather distro-specific in places, choosing only to cover Debian (Ubuntu) and Red Hat (Fedora) where non-generic commands must be used. A couple of other distros could have been added to the list without too much fuss, allowing the book to serve as a useful cross-distro reference.

Covering all the bases

And that brings us to one of the difficulties with books like this: Completeness. The cover’s promise of a complete introduction is a bold one, and Shotts can be forgiven for not providing a comprehensive guide to every last command; there are just so many of them, and they can get quite complex. Nevertheless, there are some interesting omissions here. Terminal-based text editors are extremely advanced, offering real productivity gains for programmers and other would-be command line users, but they go largely unmentioned. While a comprehensive introduction to emacs is certainly outside the scope of this book, it could at least have been mentioned! Only a brief introduction to vim is included, which suffices for basic text editing tasks. Additionally, important networking tools like SSH, which are vital for remote working and system administration, are covered only very briefly. In an ideal world, I would have liked to have seen a whole chapter with detailed information on SSH, tunnelling, and the like. This lack of total completeness, understandable though it may be, will leave some readers a little disappointed. Still, the majority of topics are covered in suitable depth to satisfy the majority.

It’s probably best to treat this book as an advanced introduction – reading it will introduce you to a whole range of useful tools, which you can then go on to learn more about using specialised documentation, or more focused technical manuals. The Linux Command Line will get you started, and quite a lot more, but you’ll eventually need to move to something more specific if you really need to delve into the murkier recesses of a certain command.

Verdict

If you would like to start using the command line, improve your existing skills, or simply want to discover tools that you were never even aware existed, this book has everything you need, and I wholly recommend it. If you want to learn about the specifics of some particular command line tool, you’ll at least get a good introduction here, but you’ll eventually have to read its manual all the same.

* Including my own; I’ve authored two titles with them, and love the house style to pieces.

Advertisements

Syntax highlighting in xemacs

For whatever reason, I’m reduced to remotely debugging some code with the random version of xemacs that happens to be installed on the remote Mac system. (I’d rather use my favourite editor, the wonderful gedit, but that’s not really practical in this particular instance.)

Anyway, it turns out that I can’t live happily without syntax highlighting, which isn’t turned on by default. To turn it on, all you need to do is type M-x font-lock-mode (I have no idea why it’s called font-lock rather than “syntax highlighting”).


Portability issues with floating point exception handling

Well, this is a fun one for a balmy Friday afternoon – it turns out that there are somewhat subtle differences between fenv.h on Linux (with glibc) and on Mac OS X. A number of functions defined in fenv.h are there to control the handling of floating point errors, and many of them are specified by the C99 standard. However, glibc has some useful extra functions defined, like feenableexcept(). Useful non-standard functions.

So, I’m looking at someone else’s code, developed on a Linux machine, which needs to enable/disable floating point exceptions at various junctures, but which I need to compile on a Mac that I’m using as a test rig (it has 8 cores, which is handy). As you can see here, fenv.h on Mac OS X is missing feenableexcept(), which is rather irksome. Hmm.

StackOverflow to the rescue: There’s a thread with a few bits of advice on how one can replace these functions on Mac OS X. I’ll try the xmmintrin.h one first; hardly ideal, but I’m hoping it will work so I can go to the pub…

Update: It turns out that there’s quite a nice implementation of glibc’s floating point exception handling functions that’s portable to Mac OS X, written by David N. Williams. See this update for more information.


NAM Jodcast interview

Hmmm, don’t think I’ve mentioned this on here before, but I was interviewed by Close Personal Friend (TM) Christina Smith for the Jodcast a few months back, while I was at NAM. Listen agape as I, ahem, masterfully discuss, erm, whatever it is my research is about.

And no, I refuse to believe that I sound like that in real life.


Numerical Cosmology titbits

I’m still at the Numerical Cosmology 2012 workshop in Cambridge (the last day is tomorrow). I’m pretty sleepy by now (9am start for talks makes Phil sad), but I figured I’d squeeze in a few stories before I retire to my bed.

Yesterday was one of those weird days that I seem have every so often which confirms to me that physics makes for an interesting life. After a full-ish programme of talks, we all watched Stephen Hawking unveil his new supercomputer, via video link, to an extremely chilly room just around the corner that we subsequently all piled into (along with grinning computer industry representatives and a TV crew) to prod the shiny new computer and gurn at the camera. Supercomputers are much smaller than they used to be. Later on, we had the conference dinner in Trinity Hall, where I sat opposite John Reid, one of the people involved in setting Fortran standards. Being the Python aficionado that I am (and also, having discovered earlier in the evening that Fortran 77 uses magic numbers for certain I/O tasks*), I solicited his thoughts on designing programming languages to encourage good programming style. Didn’t really get much out of him on that, but he did give an interesting talk on coarrays earlier in the day (coarrays are an extension to Fortran that are intended to make parallel computing a bit more transparent, and thus easier). Ho hum. I topped the evening off by introducing myself to Dick Bond as an emeritus professor. He didn’t seem convinced.

I also learned very, very much about High Performance Computing (HPC). By that, I don’t mean running your average MCMC code on the cluster in the basement – oh no, proper tens-of-thousands-of-cores stuff. It turns out that programming models and tools are beginning to look a little outdated in the face of modern HPC systems – technologies like OpenMP were apparently designed with a “many single-core nodes” architecture in mind, whereas we now have systems consisting of many nodes, each of which has many cores. Of course, it’s faster to transfer things between cores on a given node than it is to pass data between nodes, but OpenMP doesn’t make it easy to differentiate between the two situations. David Henty (Edinburgh) talked about the consequences of this for HPC – is guiding principle was to “keep data on the same device for as long as possible”. Transferring data and other communication between nodes is what kills performance on these massively parallel systems, since communication buses are intrinsically slower than working on-chip. Oh, and cache misses – if your parallel code isn’t scaling too well, or otherwise seems sluggish, the number one suspect has to be cache misses.

Another little performance-related tip comes courtesy of Hal Finkel (Argonne), who’s attempting trillion particle simulations (about 100x larger than the current state of the art) by using a whole host of clever computational tricks and optimisations. He mentioned that some architectures have specific instructions for providing rapid estimates of certain common floating point operations (e.g. square roots). If you can afford the reduced accuracy, using these prevents your code from having to branch and call sqrt() or whatever from a library. If this is happening in an inner loop – kerching! Easy instant speed-up.

And one final little note: Supercomputers break all of the time! The Mean Time Between Failures (MTBF) of a big computer system is typically a few days, so HPC codes have to store snapshots of their state (checkpoints) every few hours, and must support resuming from these checkpoints in order to be able to do anything useful.

* Why? Aaaaagh! Aaagh! (OK, it probably made slightly more sense at the time.)


Cosmological simulation showdown

I’m at the Numerical Cosmology 2012 workshop in Cambridge this week. Today, the first day, was pretty much dedicated to simulations, focusing mostly on comparisons between different techniques.

Volker Springel was showing off his moving mesh code, AREPO, which uses a Voronoi tessellation scheme to optimally resize mesh cells. The videos on his website are mesmerising. (Neat Voronoi fact: the Voronoi tessellation of a set of points is dual to the Delaunay triangulation.) Why go to all that trouble to make the mesh move around? Well, as was repeatedly pointed out during the day’s talks, Cartesian mesh methods aren’t Galilean invariant (simple translations of objects change their properties, when they shouldn’t), and smooth particle hydrodynamics (SPH) methods suppress (physical) fluid instabilities due to a spurious numerical “surface tension” property that they have. The moving mesh helps to get around these problems – the mesh follows the particle velocities, which ends up guaranteeing Galilean invariance. The use of Voronoi diagrams to define the mesh turns out not to add too much computational expense (efficient algorithms exist), but it does have problems with uniqueness of tessellations (e.g. there’s more than one valid way to tessellate a 4-point square), and solving the Riemann problem for irregularly-shaped cells seems to be harder too. Still, it looks neat.

Kelvin-Helmholtz instability in Springel's AREPO moving mesh code

Kelvin-Helmholtz instability in Springel’s AREPO moving mesh code (image by V. Springel).

Volker has apparently been quite critical of SPH methods in recent years – the method has issues with modelling contact discontinuities, due to the surface tension issue I mentioned. These are potentially important in cosmological simulations, where you get large density contrasts, for example. The classical test seems to be the (rather photogenic) Kelvin-Helmholtz instability, where two initially laminar flows in different fluids are set up to slip past one another. What should happen is that you get turbulent waves forming at the interface of the fluids, but the numerical “tension” in SPH models stops this from happening. Daniel Price took up the gauntlet and attempted to show that SPH can be fixed by properly taking into account discontinuities in the density. First of all, he showed that a standard SPH code will happily produce realistic Kelvin-Helmholtz instability if the two fluids have comparable densities – the problem is not that SPH can’t deal with instability, it’s that it can’t deal with density discontinuities, since SPH normally assumes that the density is differentiable at boundaries. By switching a viscosity term on and off depending on the situation (e.g. Price 2011Read and Heyfield 2011), Daniel claims that SPH can be made viable for cosmological simulations again. Volker disagreed in the questions afterwards, and I think they ended up “agreeing to disagree”.

There were also talks by Romain Teyssier and Brian O’Shea discussing the differences between different types of simulation, and comparing different codes (amongst other things). There have been a number of code comparisons in recent years, for a number of different physical situations (e.g. Frenk et al. 1999Agertz et al. 2007Scannapieco et al. 2011), which seem to show that most codes get broadly similar results on larger scales (for massive dark matter halos), but have divergent results at smaller scales, where baryonic physics becomes important.

Broadly similar sentiments were expressed by Rob Thacker, who talked about modelling AGN feedback. To do feedback, you need an extremely large dynamical range – in feedback studies, processes from galaxy scales down to the supermassive black hole event horizon (or thereabouts) are important. Adaptive Mesh Refinement codes (like Teyssier’s RAMSES) can get something like the dynamical range required, and not-bad results on large scales, but it sounds like there are lots of problems and uncertainties with the models of the small-scale physics. (Fun RAMSES fact: It uses Peano-Hilbert space-filling curves to do “domain decomposition” – deciding which particles are governed by which processor for massively parallel execution.) He also noted, somewhat surprisingly, that black hole advection might be important during galaxy mergers (the BH gets moved around under the gravitational pull of nearby matter, thus experiencing a changing local density, which has an effect on the accretion rate, and so on), even though the BH is extremely massive compared to anything nearby. A final interesting nugget came from the audience – apparently, during BH mergers you should expect to lose about 5% or so of the total BH-BH system mass in the form of gravitational waves, but Rob was confident in saying that current AGN simulations are nowhere near having to worry about 5% effects like that!

Lots of other interesting things went on today, but I’ll wrap up with a couple of brief notes. Ilian Iliev talked about simulating reionisation, and the challenges involved in getting photoionisation right. Apparently, many people use a purely local form for the photoionisation rate equation, which does not take into account finite volume effects, and thus does not properly conserve photon number. He wrote down the correct version, which includes an extra term in the frequency integral. Romain also mentioned that many ray-tracing codes assume that the speed of light is infinite, which is often a good approximation, but which sometimes isn’t, and gets forgotten. This can lead to up to a factor of two discrepancy in some calculations. Finally, Brian mentioned a paper by Tasker et al. (2008), which lists a suite of useful test cases for numerical simulation codes. Many of these tests were wheeled out today.

ResearchBlogging.org
Andreas Bauer, & Volker Springel (2012). Shocking results without shocks: Subsonic turbulence in smoothed particle hydrodynamics and moving-mesh simulations MNRAS DOI: http://arxiv.org/abs/1109.4413v1


Code Coffee: Prototyping with Arduino

Adam Coates gave us an introduction to the Arduino platform on Monday 25th June. It’s an open-source hardware kit for rapidly (and cheaply) prototyping electronic control systems, and seems to be pretty simple to use. Adam brought in a couple of boards that had suffered fiery deaths at the hands of a rogue motor for us to inspect, as well as a functioning one which he was using to drive an LED. There are different types of board, depending on what exactly you want to hook it up to – for example, you can get an extra “shield” board that allows you to connect a wireless module. Pre-assembled boards are readily available for purchase from many locations, and cost in the region of £20-£50, but all of the boards have open hardware specifications, which means that you can build them yourself if you like.

Coding for the Arduino seems to be quite simple. The controller on the board itself seems to be relatively limited, in that it only has a small amount of memory and is somewhat slow. For more computationally-intensive operations, the idea seems to be to use the Arduino as an interface between a full-size computer and your hardware rather than as a standalone controller. Simple operations, like blinking an LED, can be achieved in only a few lines of code, with very little boilerplate. Adam showed us the serial interface, which is used to upload code onto the board, and can also be used in an interactive mode for communicating with it.

From what I’ve seen of it, the Arduino documentation appears to be excellent; certainly, the hardware looks to be well-documented, and there’s a good amount of example code available. And there are plenty of things that you can do with it – Adam uses it to test a telescope turntable driver, I think. I’ve not quite figured out what I’d use one for myself (my Raspberry Pi is still in its box too…), but an idea that was floated during Code Coffee was reproducing the groundbreaking Holmberg lightbulb N-body simulation. Now that would be cool.