Why you can ignore reviews of scientific code by commercial software developers

tl;dr: Many scientists write code that is crappy stylistically, but which is nevertheless scientifically correct (following rigorous checking/validation of outputs etc). Professional commercial software developers are well-qualified to review code style, but most don’t have a clue about checking scientific validity or what counts as good scientific practice. Criticisms of the Imperial Covid-Sim model from some of the latter are overstated at best.

Update (2020-06-02): The CODECHECK project has independently reproduced the results of one of the key reports (“Report 9”) that was based on the Imperial code, addressing some of the objections raised in the spurious “reviews” that are the subject of this article.

I’ve been watching with increasing horror as the most credible providers of scientific evidence and advice surrounding the Coronavirus outbreak have come under attack by various politically-motivated parties — ranging from the UK’s famously partisan newspapers and their allied politicians, to curious “grassroots” organisations that have sprung up overnight, to armies of numerically-handled Twitter accounts. While there will surely be cause for a sturdy review of the UK’s SAGE (Scientific Advice Group for Emergencies) system at some point very soon, it seems clear that a number of misinformation campaigns are in full flow that are trying to undermine and discredit this important source of independent, public-interest scientific advice in order to advance particular causes. Needless to say, this could be very dangerous — discrediting the very people we most need to listen to could produce very dire results.

The strategies being used to undermine SAGE advisers will be familiar to anyone who has worked in fields related to climate change or vaccination in recent decades. I will focus on one in particular here — the use of “experts” in other fields to cast doubt on the soundness of the actual experts in the field itself. In particular, this is an attempt to explain what’s so problematic about articles like this one, which are being used as ammunition for disingenuous political pieces like this one [paywall]. Both articles are clearly written with a particular political viewpoint in mind, but they have a germ of credibility in that the critique is supposedly coming from an expert in something that seems relevant. In this case, the (anonymous) expert claims to be a professional software developer with 30 years’ experience, including working at a well-regarded software company. They are credentialled, so they must be credible, right? Who better to review the Imperial group’s epidemiology code than a software developer?

Most of what I’m going to say is a less succinct restatement of an old article by John D. Cook, a maths/statistics/computing consultant. Cook explains how scientists use their code as an “exoskeleton” — a constantly-evolving tool to help themselves (and perhaps a small group around them) answer particular questions — rather than as an engineered “product” intended to solve a pre-specified problem for a separate set of users who will likely never see or modify the code themselves. While both scientists and software developers may write code for a living — perhaps even using the same programming language and similar development tools — that doesn’t mean they are trying to achieve similar aims with their code. A software developer will care more about maintainability and end-user experience than a scientific coder, who will likely prize flexibility and control instead. Importantly, this means that programming patterns and norms that work for one may not work for the other — exhortations to keep code simple, to remove cruft, to limit the number of parameters and settings, might actually interfere with the intended applications of a scientific code for example.

Software development heuristics that don’t apply to scientific code

The key flaw in the “Lockdown Sceptics” article is that they apply software engineering heuristics to assess the quality of the code when they simply don’t apply here. As someone who has spent a good fraction of my scientific career writing and working with modelling codes of different types, let me first try to set out the desirable properties of a high-quality scientific code. For most modelling applications, this will be:

  • Scientific correctness: A mathematically and logically correct representation of the model(s) that are being studied, as well as a correct handling and interpretation of any input data. This is distinct from “correctly fitting observations” — while finding a best-fit model to some data might be one aim of a code, the ability to explore counterfactuals is also important. A code may implement a model that is a terrible fit to the data, but can still be “high quality” in the sense that it correctly implements a counterfactual model.
  • Flexibility: The ability to add, adjust, turn on/off different effects, try different assumptions etc. These codes are normally exploratory, and will be used for studying a number of different questions over time, including many that will only arise long after the initial development. Large numbers of parameters and copious if statements are the norm.
  • Performance: Sufficient speed and/or precision to allow the scientist to answer questions satisfactorily. Repeatability and numerical precision fall under this category, as well as raw computational performance. (It is also common for scientific codes to have settings that allow the user to trade-off speed vs accuracy, depending on the application.)

Note that I have in mind the kinds of codes used by specialist groups that are usually seeking to model certain classes of phenomena. There are other types of scientific code intended for more general use with goals that hew closer to what software engineers are generally trying to achieve. The Imperial code does not fall into this second category however.

What things are missing from this list that would be high priority for a professional software developer? Here are a few:

  • Maintainability: Most scientific codes aren’t developed with future maintainers in mind. As per John Cook, they are more likely to be developed as “exoskeletons” by and for a particular scientist, growing organically over time as new questions come up. Maintainability is a nice-to-have, especially if others will use the code, but it has little bearing on the code’s scientific quality. Some scientifically valuable codes are really annoying to modify!
  • Documentation: Providing code and end-user documentation is a good practice, but it’s not essential for scientific codes. Different fields have different norms surrounding whether code is open sourced on publication or simply available on request for example, and plenty of scientists do a bad job of including comments or writing nice-to-read code. This is because the code is rarely the end product in itself — it is normally just a means to run some particular mathematical model that is then presented (and defended and dissected) in a journal article. The methods, assumptions, and consistency and accuracy of the results is what matters — the code itself can be an ugly mess as long as it’s scientifically correct in this sense.
  • User-proofing/error checking: To a software developer, a well-engineered code shouldn’t require any knowledge of internal implementation details on the part of the end user. The code should check user inputs for validity, and, to the greatest extent possible, prevent them from doing things that are wrong, or invalid, or that will produce nonsense results, for the widest possible range of possible inputs. Some level of error-checking is nice to have in scientific codes too, but in many cases the code is presented “as-is” — the user is expected to determine what correct and valid inputs are themselves, through understanding of the internals and the scientific principles behind them. In fact, a code may even, intentionally, produce an output that is known to be “wrong” in some ways and “right” in others — e.g. the amplitude of a curve is wrong, but its shape is right. In essence, the user is assumed to understand all of the (known/intended) limitations of the code and its outputs. This will generally be the case if you run the code yourself and are an expert in your particular field.
  • Formal testing: Software developers know the value of a test suite: Write unit tests for everything; throw lots of invalid inputs at the code to check it doesn’t fall over; use continuous integration or similar to routinely test for regressions. This is good practise that can often catch bugs. Setting up such infrastructure is still not the norm in scientific code development however. So how do scientists deal with regressions and so on? The answer is that most use ad hoc methods to check for issues. When a new result first comes out of the code, we tend to study it to death. Does it make sense? If I change this setting, does it respond as expected? Can it reproduce idealised/previous results? Does it agree with this alternative but equivalent approach? Again, this is a key part of the scientific process. We also output meaningful intermediate results of the code as a matter of course. Remember that we are generally dealing with quantities that correspond to something in the real world — you can make sure you aren’t propagating a negative number of deaths in the middle of your calculation for example. While these checks could also be handled by unit tests, most scientists generally just end up with their own weird set of ad hoc test outputs and print statements. It’s ugly, and not infallible, but it tends to work well given the intensive nature of our result-testing behaviour and community cross-checking.

These last four points will horrify most software developers (and I know quite a few — I was active in the FOSS movement for a solid decade; buy my book etc etc). Skipping these things is terrible practise if you’re developing software for end-users. But for scientific software, it’s not so important. If you have users other than yourself, they will figure things out after a while (a favourite project for starting grad students!) or email you to ask. If you put in invalid inputs, your testing and other types of scientific examination of the results will generally uncover the error. And, really, who cares if your code is ugly and messy? As long as you are doing the right things to properly check the scientific results before publishing them, it doesn’t matter if you wrote it in bloody Perl with Russian comments — the quality of the scientific results is what matters, not the quality of the code itself. This is well understood throughout the scientific community.

In summary, most scientific modelling codes are expected to be used by user-developers with extensive internal knowledge of the code, the model, and the assumptions behind it, and who are routinely performing a wide variety of checks for correctness before doing anything with the results. In the right hands, you can have a lot of confidence that sensible, rigorous results are being obtained; however they are not for non-expert users.

Specific misunderstandings in the “Lockdown Sceptics” article

I will caveat this section with the fact that I am an astrophysicist and not an epidemiologist, so can’t critique the model assumptions or even really the extent to which it has been implemented well in the Imperial code. I can explain where I think the Lockdown Sceptics article has missed the point of this kind of code though.

Non-deterministic outputs: This is the most important one, as it could, in particular circumstances, be a valid criticism. The model implemented by this code is a stochastic model, and so is expected to produce outputs with some level of randomness (it is exploring a particular realisation of some probability distribution; running it many times will allow us to reconstruct that distribution, a method called Monte Carlo). Computers deal in “pseudo-randomness” though; given the same starting “seed” they will produce the same random-looking sequence. A review by a competing group in Edinburgh found a bug that resulted in different results for the same seed, which is generally not what you’d want to happen. As you can see at that link, a developer of the Imperial code acknowledged the bug and gave some explanation of its impact.

The key question here is whether the bug could have caused materially incorrect results in published papers or advice. Based on the response of the developer, I would expect not. They are clearly aware of similar types of behaviour happening before, which implies that they have run the code in ways that could pick up this kind of behaviour (i.e. they are running some reproducibility tests — standard scientific practise). The bug is not unknown. A particular workaround here appears to be re-running the model many times with different seeds, which is what you’d do with this code anyway; or using different settings that don’t seem to suffer from this bug. My guess is that the “false stochasticity” caused by this bug is simply inconsequential, or that it doesn’t occur with the way they normally run the code. They aren’t worried about it — not because this is a disaster they are trying to cover up, but because this is a routine bug that doesn’t really affect anything important.

Again, this is bread and butter for scientific programming. They have seen the issue before, and so are aware of this limitation of the code. Ideally they would have fixed the bug, yes, but with this sort of code we’re not normally trying to reach a state of near-perfection ready for a point release or some such, as with commercial software. Instead, the code is being used in a constantly evolving state. So perhaps, being aware of it, it’s just not a very high priority to fix given how they are using the code. Indeed, why would they run the code in such a way that the bug arises and knowingly invalidates their results? It’s pretty clear this is not a major result-invalidating bug from their behaviour (and the behaviour of the reporter from Edinburgh) alone.

Undocumented equations: See above regarding the approach to documentation. It would definitely be much more user-friendly to document the equations, but does it mean that the code is bad? No. For all we know, there is a scruffy old LaTeX note explaining the equations, or they are in one of the early papers (either are common). This is totally normal — ugly, and not helpful for the non-expert trying to make sense of the code, but not an indicator of poor code quality.

Continuing development: As per the above, scientific codes of this kind generally evolve as they need to, rather than aiming for a particular release date or set of features. Continuing development is the usual, and things like bugfixes are applied as and when they crop up. Serious issues that affect previously published results would normally prompt an erratum (e.g. see this one of mine); some scientists are less good about issuing errata (or corrective follow-up papers) than others, especially for more minor issues, although covering up a really serious issue would be a career-ending ethical violation for most. As I hope I’m making clear from the above, the article’s charge of serious “quality problems” isn’t actually borne out though; they are just (harmlessly!) violating the norms they are used to from a completely different field.

Some other misunderstandings from that article:

  • “the original program was ‘a single 15,000 line file that had been worked on for a decade’ (this is considered extremely poor practice)” — Not to a scientist it’s not! If anything, the fact that the group have been plugging away at this code for a decade, with an increasing number of collaborators, confronting it with more and more peer reviewers, and withstanding more and more comparisons from other groups, gives me more confidence in it. It certainly improves the chances that substantial bugs would have been found and resolved over time, or structural flaws noticed. Young, unproven codes are the dangerous ones! And while the large mono-file structure will surely be annoying to work with (and so is poor in that sense), it has no bearing on the actual scientific correctness of the code.
  • “A request for the original code was made 8 days ago but ignored, and it will probably take some kind of legal compulsion to make them release it. Clearly, Imperial are too embarrassed by the state of it ever to release it of their own free will, which is unacceptable given that it was paid for by the taxpayer and belongs to them.” — This is a tell-tale sign that this person isn’t a scientist. First, the motto of most academics is “Apologies for the late reply”! Waiting 8 days for a reply to a potentially complicated and labour-intensive request is nothing, especially as the group is obviously busy with more urgent matters. Second, there’s no saying that the taxpayer paid for most of the code (it could be funded by a charitable foundation like the Wellcome Trust for example), and the code will likely remain the IP of the author, but with Imperial retaining a perpetual license to it. Instead, the obligation for openness here comes from the publications that use the code. Most journals require that authors make code and data used to produce results in particular journal articles available on request. Note that some scientists are cagey about releasing their code fully publicly because they worry about competitors co-opting it (and not without good reason). I personally have made all of my scientific code available by default however, and it’s good that this group are making theirs fully public too. It’s the right thing to do in this scenario (and we should also recognise previously opened codes, such as the one from the LSHTM group also used by SAGE).
  • “What it’s doing is best described as ‘SimCity without the graphics'” — Fantastic! The original SimCity used a tremendously sophisticated model for what it did, and has even been used in teaching town planners (I remember reading the manual in the 90s).
  • “The people in the Imperial team would quickly do a lot better if placed in the context of a well run software company….the difference between ICL and the software industry is the latter has processes to detect and prevent mistakes” — Now really, this is teaching grandma to suck eggs. Remember that much of modern programming emerged from academic science. This is not to say that your average scientific programmer couldn’t stand to learn some cleaner coding practises, but to accuse scientists of not having processes to detect and prevent mistakes — ludicrous! The bedrock of the scientific process is in validation and self-correction of results, and we have plenty of highly effective tools in our arsenal to handle that thank you very much. Now I have found unit testing, continuous integration etc. to be useful in some of my more infrastructural projects, but they are conveniences rather than necessities. Practically every scientist I know spends most of their time checking their results rather than coding, and I sincerely doubt that the Imperial group is any different. If anything, in most fields there is a culture of “conspicuous correctness” — finding mistakes in the work of others is a highly prized activity (especially if they are direct competitors).
  • “Models that consume their own outputs as inputs is problem well known to the private sector – it can lead to rapid divergence and incorrect predictions” — This is a highly simplistic way of looking at things, and I suspect the author doesn’t know much about this kind of method. I don’t know specifically what the Imperial folks are doing here, but there are important classes of methods that use feedback loops of this kind called iterative methods that are common in solving complicated systems of coupled equations and so on, which are mathematically highly rigorous. I have used them in a statistical modelling context on occasion. Ferguson and co are from highly numerate backgrounds, and so I think it’s safe to assume they’re not missing obvious problems of this kind.

I could go on, but hopefully this is enough to establish my case — the author of that article is out of their depth, and clearly unaware of many of the basics of numerical modelling or the way this kind of science is done (with great historical success across many fields, might I add). In fact, they are so far out that they don’t even realise how silly this all sounds to someone with even a cursory knowledge of this kind of thing — it is an almost perfect study in the Dunning-Kruger effect. How they reached the conclusion that scientists must be so incompetent that “all academic epidemiology [should] be defunded” and that “This sort of work is best done by the insurance sector” is truly remarkable — there is a remarkable arrogance in overlooking the possibility that, just maybe, the failure is in their own understanding.

The peculiarly implausible nature of the accusations

What I have discussed above is a (by no means complete) explanation of how many, if not most, scientific modellers approach their job. I have no special insight into the workings of the Imperial group; this is more an attempt to explain some of the sociology, attitude, and norms of quantitative scientific modelling. Professional software developers will hate some of these norms because they are bad end-user software engineering — but this doesn’t actually matter, since scientific correctness of a code typically owes little to the state of its engineering, and we have a very different notion of who the end user is compared to a company like Google. Instead, we have our own well-worn methods of checking that our codes — our exoskeletons — are scientifically correct in the most pertinent ways, backed up by decades of experience and rabid, competitive cross-checking of results. This, really, is all that matters in terms of scientific code quality, whether you’re publishing prospective theories of particle physics or informing the public health response to an unprecedented pandemic.

Let’s not lose sight of the bigger picture here though. The point of the Lockdown Sceptics article is to challenge the validity of the Imperial code by painting it as shoddy, and presumably, therefore, to undermine the basis of particular actions that may have taken note of SAGE advice. For this to actually be the case, though, the Imperial group must have (a) evaded detection for over 10 years from a global community of competing experts; (b) be almost criminally negligent as scientists, by having ignored easily-discovered but consequential bugs; (c) be almost criminally arrogant to suppose that their unchecked/flawed model should be used to inform such big decisions; and (d) for the entire scientific advisory establishment to have been taken for a ride without any thought to question what they were being told. Boiling this all down, the author is calling several hundred eminent scientists — in the UK and elsewhere — complete idiots, while they, through maybe half an hour of cursory inspection, have found the “true, flawed” nature of the code through a series of noddy issues.

This is clearly very silly, as even an ounce of introspection would have revealed to the author of that article had they started out with innocent motives. Their “code review” is no such thing however — instead, it is a blatant hatchet job, of the kind we have come to expect from climate change deniers and other anti-science types. The article author’s expertise in software development (which I will recognise, despite their anonymity) is of entirely the wrong type to actually, meaningfully, review this codebase, and is clearly misapplied. You may as well ask a Java UI programmer to review security bugs in the Linux kernel. To then rejoice in the fact that this has been picked up by an obviously lop-sided wing of the press and used to push a potentially harmful agenda, against the best scientific evidence we have, is chilling.


Prioritising modified gravity models

Just a brief note this week. One thing looming on the horizon is the LSST Dark Energy Science Collaboration meeting in Stanford, where we’ll spend a couple of sessions discussing which models beyond the standard LambdaCDM model we should prioritise to be tested when the first data arrive.

In an ideal world, we’d test everything that seemed interesting, for a very loose and inclusive definition of the word interesting. Wouldn’t want to miss out that crazy theory that might just be the answer to all the big problems in physics now, would we? The problem is, testing models takes a lot of time and effort, with the effort required becoming increasingly prohibitive as we begin to push to sub-1% precision on cosmological parameters. Modern survey data are incredibly complex, and so it takes a lot to ensure that the analysis you’re doing is robust – a lot of computing time, a lot of validation, a lot of simulations, a lot of model complexity… It’s just hard.

So, we need to prioritise. I think the best bet for now would be to whittle down the vast array of possible models into a very short but diverse list. This could cover some neat examples of very different physics, but without any attempt at being comprehensive. The diversity will give us a handful of “example” implementations of testable models that can be used as templates for future, more comprehensive, analyses. Sticking to a short list is crucial for now however, as it will allow us to focus our development and simulation effort without overreaching.


Gamma ray background and CO galaxy lensing

A productive week, mostly spent kicking off a few new projects.

At the beginning of the week, Stefano Camera was visiting from Manchester. He gave a great talk about putting constraints on particle dark matter by looking for annihilation signatures in the gamma-ray background (as observed by Fermi). Various other processes can contribute to the background, so ideally one would apply some filter to extract only the dark matter contribution (which currently has unknown amplitude). One answer is to cross-correlate the Fermi map with a reconstructed map of the weak lensing potential, which is probably one of the purest tracers of the dark matter distribution you can get. This should get pretty good with future weak lensing datasets, and future Fermi data. A really nice idea.

I was also asked to give an overview of the cosmology landscape in 2030, for the benefit of the ngVLA (next-generation VLA) cosmology working group. They’re in the process of building a science case for ngVLA, which looks to have many similarities to SKA in terms of size (~200 dishes) and approach (they want a big “facility”-style observatory), but over the 5-100 GHz band instead (the SKA1-MID dish array should effectively cover 350 MHz – 15 GHz). The question is how this could be interesting for cosmology.

Speaking from experience with the SKA, it can be difficult to carve out a really compelling cosmology science case, mostly because there’s just so much competition from other surveys and observational methods. People can often do much of what you want to do, but sooner, or with better-understood (though not necessarily superior) methods. The question is whether your experiment can do something really novel and interesting in the space that’s left.

One suggestion is to do CO intensity mapping with ngVLA, which I think would be neat. Except – its field of view will be small, leading to low survey speeds, so it won’t be able to measure the large volumes that are most useful for cosmology. It’d be handy for constraining the star-formation history of the Universe though, as CO is thought to be a very good tracer of that. There’s also going to be competition from smaller, cheaper (and sooner) CO-IM experiments like COMAP (which I was surprised to learn already has a prototype running, and is hoping to be fully commissioned around the end of 2017).

My proposal was to consider doing a weak lensing survey with ngVLA, perhaps using the CO line. The Manchester group have done a lot of work on radio weak lensing recently, mostly targeting the SKA. They plan to perform a continuum survey at ~1 GHz over a few thousand square degrees, yielding an acceptably high source number density and sufficient angular resolution to measure galaxy ellipticities. Redshift information for continuum sources is very scant however, so there’s a significant loss of information due to effectively averaging over all the radial Fourier modes; an SKA1 survey should still have comparable performance to DES though. In any case, the real power of this approach is in cross-correlating the radio and optical lensing data, which would have the effect of removing many difficult systematics that could be extremely difficult to identify and remove with sufficient precision in a single survey. Radio and optical lensing systematics are expected to look quite different; even the atmosphere has a very different effect between the two.

While I haven’t done the full calculation yet (in progress!), my suspicion is that ngVLA could be even better at weak lensing than SKA1, if it has sufficient sensitivity. By targeting the CO line, one gets precise redshift information about the detected galaxies, which should allow much more information to be recovered from the lensing signal than in a continuum survey. By virtue of working at higher frequency, the ngVLA should also have a higher angular resolution, presumably making shape measurement easier too. Most of the other advantages of radio weak lensing are retained, and so this could be a nice dataset to cross-correlate with (e.g.) LSST, and thereby convincingly validate their lensing analysis. The question really is whether ngVLA would have the sensitivity (and survey speed) for this to be practical, however. Stay tuned.


Neutral hydrogen

In an attempt to blog more often, I’ve decided to try writing brief, weekly-ish research updates. Let’s see how long this lasts…

Beyond BAO with autocorrelation intensity mapping experiments

This week, I’ve been at the Cosmology with Neutral Hydrogen workshop in Berkeley, where I gave a talk about autocorrelation (“single-dish”) 21cm intensity mapping experiments, of the kind we’re planning for SKA. As far as the US community is concerned, low-redshift (z < 6) intensity mapping is synonymous with (a) interferometric experiments, like CHIME and HIRAX, and (b) BAO surveys. My argument is that there are distinct advantages to exploring non-BAO science with IM experiments, and that autocorrelation experiments have significant benefits when it comes to those other science cases.

A more provocative statement is that “BAO is [ultimately] a dead end for 21cm IM”, which led to a rather passionate discussion at the end of the first day. I think this is a fair statement – while detecting the BAO will be an excellent, and probably necessary, validation of the 21cm IM technique (you either see the BAO bump feature at the right scale in the correlation function or don’t), contemporary spectroscopic galaxy surveys will cover a big chunk of the interesting redshift range (0 < z < 3) over a similar timeframe, and people will probably trust their results more. That is, counting galaxies is simpler than subtracting IM foregrounds. Perhaps something more can be gained by the ability of IM surveys to reach larger volumes and higher redshifts with tractable amounts of survey time (spectroscopic galaxy surveys are slow and expensive!), but I doubt this will lead to much more than mild improvements on parameter constraints.

Once IM has shown that it can detect the BAO, and is therefore a viable method, where do we go from there? I advocated targeting science for which the IM technique has definitive advantages over other methods. In particular, I suggested IM as being particularly promising for constraining extremely large-scale clustering (e.g. to detect new general relativistic effects and scale-dependent bias from primordial non-Gaussianity), and putting integral constraints on faint emission (i.e. sources deep into the faint end of the luminosity function). Galaxy surveys can’t do the latter unless they’re incredibly deep, and can’t do the former without excessive amounts of survey time. Autocorrelation IM is a better fit for these techniques than interferometric IM because (a) autocorrelation sees all scales in the survey area larger than the beam size, while interferometers filter out large scales unless you have a high density of very short baselines, and (b) there is no “missing flux” due to missing baselines (and therefore missing Fourier modes), which would screw-up integral constraints on total emission. That said, interferometers are probably a safer way to get an initial IM BAO detection, owing to the relative difficulty of calibrating autocorrelation experiments. My money is still on CHIME to get the initial 21cm BAO detection.

There are a few autocorrelation IM experiments on the slate right now, including BINGO (and purpose-built IM experiment that will start operations in the ~2018 timeframe), MeerKAT (for a which a 4,000 hour, 4,000 sq. deg. IM survey called MeerKLASS, matched to the DES footprint, has been proposed), and SKA1-MID (which I’ve spent a lot of time working on; it’s due to switch on around 2020), in addition to existing surveys with GBT and Parkes. If the various hard data analysis challenges can be solved for these experiments (which I think they can be), this will open up several exciting  scientific possibilities that are almost unique to IM, like measuring ultra-large scales. And I think this should be recognised as a more promising niche for the technique – BAO detections are a medium-term validation strategy that will likely provide interesting (but not Earth-shattering) science, but ultimately they’re not its raison d’etre.

Validating 21cm results – how can you trust the auto-spectrum?

Another thing that provoked much hand-wringing was the difficulty of definitively verifying 21cm auto-spectrum detections. The GBT experiment has been trying to do this, but it’s hard. Perhaps the power spectrum they’ve detected is just systematics? Or maybe they’ve over-subtracted the foregrounds and thus taken some signal with them? They claim to be able to combine upper and lower bounds from their auto- and (WiggleZ) cross-spectra respectively to measure parameters like the HI bias and HI fraction, but I have my reservations. As I said, it’s hard.

In my opinion, we need to wait for experiments with greater data volumes (wider survey areas, higher SNR). Then, we gain the ability to perform a much wider array of null tests than are currently possible with the GBT data. This is what people do to validate any other precision experiment, like Planck. It’s not a silver bullet, sure, but it’ll be a good, informative way to build confidence in any claimed auto-power detection.

So, why worry? Just wait for more data, then do the statistical tests.


Book Review: How Software Works

Another month, another book kindly sent for review from No Starch. This one’s of the more conceptual variety, and sets out to explain – in laymen’s terms – the algorithms that are responsible for much of the “magic” of modern technology.

hsw_cover-front

When I was younger, and just getting into computers, I used to spend hours reading second-hand software and hardware manuals. I think I must have read the manual for my old computer’s motherboard 50 times. A kindly network engineer from the PC Pro forums (ah, those were the days) sent me an old 400-page networking manual that I inhaled too. Manuals were the best. Manuals showed me what computers were capable of.

It wasn’t until later that I became aware of where the real action is. Algorithms. Manuals are a fine thing, but they’re typically written at a high-level. They tell you what’s happening – and which buttons to press – but, for expediency, often skip over the “how” – the way the mysterious feats that first got me fired up about computers are actually achieved.

This book, How Software Works (by V. Anton Spraul), is all about the “how”. You won’t find any practical manual-type information in here at all, so don’t expect to come out the other side of this book with a finely-honed knowledge of printer troubleshooting or anything like that. No, this is a very pure book that explains, in uncompromisingly non-technical terms, how computers achieve their magic.

Each chapter covers a broad but real-world relevant topic, such as web security, movie CGI, or mapping. After some background on each topic, Spraul sketches out the most important pieces of the algorithmic puzzle needed to produce the “everyday” results we now take for granted in movies, on the web, and in our smartphone apps. This might include a walkthrough of the logic behind a trapdoor function, of the sort that that makes public key encryption possible (which in turn makes internet shopping practical). Or perhaps the step-by-step process by which a rendering program builds up a realistic virtual scene in a movie, through ray tracing.

The writing is very clear and non-technical, almost without exception, and assumes very little prior knowledge. You do not need a technical background to understand this book, but you’ll want to spend some time to follow the examples and ruminate on them a little to really get everything. The examples themselves are plentiful, and include step-by-step illustrations of simplified situations that, when linked together, demonstrate how each algorithm works as a whole.

Given that this is a book on software, it’s slightly disappointing that the presentation is completely “dead-tree traditional”. By this, I mean that there’s no supplementary material in the form of working code snippets that one could play with, or interactive demonstrations. This feels like a missed opportunity, at least for those of us who learn best by tinkering (c.f. the excellent W3Schools “Try It Yourself” tutorials). It’d also turn the book into a more direct educational tool, perhaps something that a class could be based on – and there are enough simple web-based programming systems out there to remove much of the burden of having to “teach” programming in the first place. This is more of a wishlist item than a crucially missing piece, however.

Another minor criticism is the length of the book. It would have been nice to see a few more topics covered, or perhaps a little more detail in the final chapters. The material on searching could go into more detail in explaining how web search works, for example, including things like how robots/crawlers and ranking algorithms (e.g. PageRank) actually do their thing. As it is, it feels like the author ran out of steam before getting to the real crux of this topic.

All in all, it’s a very nice book, and I learned a lot about some interesting, highly-relevant techniques that I was only dimly aware existed. The material on encryption in particular outlines a clever and essentially mathematical topic that will speak to those of you who enjoy logic puzzles, for example. I’m not quite sure who the intended audience is for the book as a whole, but it’s definitely something to keep in mind for an aspiring techie – a teenager who’s still reading the manuals, perhaps, and is ready to have their horizons broadened. The mechanically-minded, those with a fundamental curiosity about how things work, will also enjoy.


Book Review: How Linux Works 2

I received a review copy of How Linux Works 2, by Brian Ward, from the lovely folks at No Starch Press (my publisher) late last year. Inexcusably, it’s taken me until now to put together a proper review; here it is, with profuse apologies for the delay!

How Linux Works 2

How Linux Works 2 is a very nice technical read. I’ve been a user and administrator of Linux systems for over a decade now, and can safely say I learned a lot of new stuff from both angles. Newer users will probably get even more from it – although absolute beginners with less of a technical bent might be better off looking elsewhere.

The book fills something of a niche; it’s not a standard manual-type offering, nor is it a technical system reference. It’s more impressionistic than either of those, written as a sort of overview of the organisation and concepts that go into a generic Linux system, although with specific details scattered throughout that really get into the nuts and bolts of things. If you’re looking for “how-to”-type instructions, you’re unlikely to find everything you need here, and it isn’t a comprehensive reference guide either. But if you’re technically-minded and want to understand the essentials of how most Linux distros work in considerable (but not absolute) depth, with a bit of getting your hands dirty, then it’s a great book to have on your shelf.

Various technical concepts are covered ably and concisely, and was I left with a much better feeling for more mysterious Linux components – like the networking subsystem – than I had before. There are practical details here as well though, and you’ll find brief, high-level overviews of a number of useful commands and utilities that are sufficient to give a flavour for what they’re good for without getting too caught up in the (often idiosyncratic) specifics of their usage.

That said, the author does sometimes slip into “how-to” mode, giving more details about how to use certain tools. While this is fine in moderation, the choice of digression is sometimes unusual – for example, file sharing with Samba is awarded a whole six pages (and ten subsections) of usage-specifics, while the arguably more fundamental CUPS printing subsystem has to make do with less than 2 pages. The discussion of SSH is also quite limited, despite the importance of this tool from both the user’s and administrator’s perspective, and desktop environments probably could have done with a bit more than a brief single-chapter overview. Still, this book really isn’t intended as a manual, and the author has done well not to stray too far in this direction.

A common difficulty for Linux books is the great deal of variation between distros. Authors often struggle with where to draw the line between complete (but superficial) distro-agnostic generality and more useful, but audience-limiting, distro specifics. How Linux Works succeeds admirably in walking this tightrope, providing sufficient detail to be useful to users of more or less any Linux system without repeatedly dropping into tiresome list-like “distro by distro” discussions. This isn’t always successful – the preponderance of init systems in modern distros has necessitated a long and somewhat dull enumeration of three of the most common options, for example – but HLW2 does much better at handling this than most books I’ve seen. The upshot is that the writing is fluid and interesting for the most part, without too many of the “painful but necessary” digressions that plague technical writing.

Overall, this book is an enjoyable and informative read for anyone interested in, well, how Linux works! You’ll get an essential understanding of what’s going on under the hood without getting bogged down in minutiae – making this a very refreshing (and wholly recommended) addition to the Linux literature.

You can find a sample chapter and table of contents/index on the No Starch website.


NAM 2015 Radio Surveys session

I co-chaired a series of parallel sessions on radio surveys at the 2015 UK National Astronomy Meeting in Llandudno earlier this month. It was a fun session, with lots of nice talks. We’ve now made the talk slides available online –  take a look!


Cosmology with the Square Kilometre Array

A large fraction of my time over the last 18 months has been spent working out parts of the cosmology science case for the Square Kilometre Array, a gigantic new radio telescope that will be built (mostly) across South Africa and Australia over the coming decade. It’s been in the works since the early 90’s and – after surviving the compulsory planning, political wrangling, and cost-cutting phases that all Big Science projects are subjected to – will soon be moving to the stage where metal is actually put into the ground. (Well, soon-ish – the first phase of construction is due for completion in 2023.)

Infographic: SKA will have 8x the sensitivity of LOFAR.

A detailed science case for the SKA was developed around a decade ago, but of course a lot has changed since then. There was a conference in Sicily around this time last year where preliminary updates on all sorts of scientific possibilities were presented, which were then fleshed out into more detailed chapters for the conference proceedings. While a lot of the chapters were put on arXiv in January, it’s good to see that all of them have now been published (online, for free). This is, effectively, the new SKA science book, and it’s interesting to see how it’s grown since its first incarnation.

My contribution has mostly been the stuff on using surveys of neutral hydrogen (HI) to constrain cosmological parameters. I think it’s fair to say that most cosmologists haven’t paid too much attention to the SKA in recent years, apart from those working on the Epoch of Reionisation. This is presumably because it all seemed a bit futuristic; the headline “billion galaxy” spectroscopic redshift survey – one of the original motivations for the SKA – requires Phase 2 of the array, which isn’t due to enter operation until closer to 2030. Other (smaller) large-scale structure experiments will return interesting data long before this.

Artist's impression of the SKA1-MID dish array.

We’ve recently realised that we can do a lot of competitive cosmology with Phase 1 though, using a couple of different survey methods. One option is to perform a continuum survey [pdf], which can be used to detect extremely large numbers of galaxies, albeit without the ability to measure their redshifts. HI spectroscopic galaxy surveys rely on detecting the redshifted 21cm line in the frequency spectrum of a galaxy, which requires narrow frequency channels (and thus high sensitivity/long integration times). This is time consuming, and Phase 1 of the SKA simply isn’t sensitive enough to detect a large enough number of galaxies in this way in a reasonable amount of time.

Radio galaxy spectra also exhibit a broad, relatively smooth continuum, however, which can be integrated over a wide frequency range, thus enabling the array to see many more (and fainter) galaxies for a given survey time. Redshift information can’t be extracted, as there are no features in the spectra whose shift can be measured, meaning that one essentially sees a 2D map of the galaxies, instead of the full 3D distribution. This loss of information is felt acutely for some purposes – precise constraints on the equation of state of dark energy, w(z), can’t be achieved, for example. But other questions – like whether the matter distribution violates statistical isotropy [pdf], or whether the initial conditions of the Universe were non-Gaussiancan be answered using this technique. The performance of SKA1 in these domains will be highly competitive.

Another option is to perform an intensity mapping survey. This gets around the sensitivity issue by detecting the integrated HI emission from many galaxies over a comparatively large patch of the sky. Redshift information is retained – the redshifted 21cm line is still the cause of the emission – but angular resolution is sacrificed, so that individual galaxies cannot be detected. The resulting maps are of the large-scale matter distribution as traced by the HI distribution. Since the large-scale information is what cosmologists are usually looking for (for example, the baryon acoustic scale, which is used to measure cosmological distances, is something like 10,000 times the size of an individual galaxy), the loss of small angular scales is not so severe, and so this technique can be used to precisely measure quantities like w(z). We explored the relative performance of intensity mapping surveys in a paper last year, and found that, while not quite as good as its spectroscopic galaxy survey contemporaries like Euclid, SKA1 will still be able to put strong (and useful!) constraints on dark energy and other cosmological parameters. This is contingent on solving a number of sticky problems to do with foreground contamination and instrumental effects, however.

The comoving volumes and redshift ranges covered by various future surveys.

The thing I’m probably most excited about is the possibility of measuring the matter distribution on extremely large-scales, though. This will let us study perturbation modes of order the cosmological horizon at relatively late times (redshifts below ~3), where a bunch of neat relativistic effects kick in. These can be used to test fundamental physics in exciting new ways – we can get new handles on inflation, dark energy, and the nature of gravity using them. With collaborators, I recently put out two papers on this topic – one more general forecast paper, where we look at the detectability of these effects with various survey techniques, and another where we tried to figure out how these effects would change if the theory of gravity was something other than General Relativity. To see these modes, you need an extremely large survey, over a wide redshift range and survey area – and this is just what the SKA will be able to provide, in Phase 1 as well as Phase 2. While it turns out that a photometric galaxy survey with LSST (also a prospect for ~2030) will give the best constraints on the parameters we considered, an intensity mapping survey with SKA1 isn’t far behind, and can happen much sooner.

Cool stuff, no?


Press complaint: Daily Mail vs. BICEP2 commentators

In March of this year, immediately following the jubilation surrounding the BICEP2 results, the Daily Mail published a bizarre opinion piece on two scientists that were interviewed about the experiment on BBC’s Newsnight programme. The gist of the article was that the Beeb was cynically polishing its “political correctness” credentials by inviting the scientists to the programme, because they were both non-white and non-male. More details about the debacle can be found in this Guardian article.

Now, I’m not much of a Daily Mail fan at the best of times, but this struck me as particularly egregious; not only were their facts wrong and their tone borderline racist and sexist (in my opinion, at least), but they also seemed to be mistaking science for some sort of all white, all-boys club that women and people of other ethnic groups have no right to involve themselves with. This is damaging to all of us in science, not just those who were personally attacked – so I complained.

I just received word back on my complaint, which was sent to the Press Complaints Commission in the UK, who have the job of (sort of) regulating the press. Their response is reproduced below in full; my allegation of factual inaccuracy was upheld, but they declined to act on the allegation of inappropriate racial/gender commentary because I wasn’t one of the parties being discussed.

Commission’s decision in the case of

A man [me] v Daily Mail

The complainant expressed concern about an article which he considered to have been inaccurate and discriminatory, in breach of Clauses 1 (Accuracy) and 12 (Discrimination) of the Editors’ Code of Practice. The article was a comment piece, in which the columnist had critically noted Newsnight’s selection of “two women….to comment on [a] report about (white, male) American scientists who’ve detected the origins of the universe”.

Under the terms of Clause 1 (i) of the Code, newspapers must take care not to publish inaccurate information, and under Clause 1 (ii) a significant inaccuracy or misleading statement must be corrected promptly, and with “due prominence”.

The newspaper explained that its columnist’s focus on gender and ethnicity was designed to be nothing more than a “cheeky reference” to the BBC’s alleged political correctness. In the columnist’s view, the selection of Dr Maggie Aderin-Pocock and Dr Hiranya Peiris to comment on the BICEP2 (Background Imaging of Cosmic Extragalactic Polarisation) study was another such example of this institutional approach.

The complainant, however, noted the BICEP2 team were, in fact, a diverse, multi-ethnic, multi-national group which included women, something which the newspaper accepted. Furthermore, he said that white, male scientists had been interviewed on Newsnight as well, which undermined the columnist’s claim that Dr Maggie Aderin-Pocock and Dr Hiranya Peiris had been specifically selected. The suggestion that the BICEP2 team were all white and male was a basic error of fact and one which appropriate checks could have helped to prevent. There had been a clear failure to take care not to publish inaccurate information, and a corresponding breach of Clause 1 (i) of the Code.

The newspaper took a number of measures to address the situation: the managing editor wrote to both Dr Aderin-Pocock and Dr Peiris; a letter criticising the columnist’s argument was published the following day; its columnist later explicitly noted both scientists expertise, and competence to comment on the study; and, a correction was published promptly in the newspaper Corrections & clarifications column which acknowledged that the BICEP2 study was “conducted by a diverse team of astronomers from around the world”, and which “apologis[ed] for any suggestion to the contrary”. The latter measure was sufficient to meet the newspaper’s obligation under Clause 1 (ii) of the Code, to correct significantly misleading information.

The columnist’s suggestion that Dr Aderin-Pocock and Dr Peiris were specifically selected for the Newsnight programme because of “political correctness” was clearly presented as his own comment and conjecture which, under Clause 1 (iii) and the principle of freedom of expression, he was entitled to share with readers. There was, therefore, no breach of the Code in publishing that suggestion. However, the subsequent correction of the factual inaccuracy regarding the BICEP2 team and the acknowledgment of both experts’ expertise will have allowed readers to assess the suggestion in a new light.

Under Clause 12 (Discrimination) (ii) of the Code, “details of an individual’s race, colour, religion, sexual orientation, physical or mental illness or disability must be avoided unless genuinely relevant to the story”. The complainant’s concerns under this Clause were twofold; he believed that the references to the gender and ethnic background of both Dr Aderin-Pocock and Dr Peiris, and the BICEP2 team members, were irrelevant in a column about a scientific study. While the terms of Clause 12 (ii) do not cover irrelevant references to gender, the Commission would need to have received a complaint from a member, or members of the BICEP2 team, or Dr Aderin-Pocock or Dr Peiris in order to consider the complaint about under this Clause. In the absence of any such complaints, the Commission could not comment further.


BICEP2: Impact on cosmological parameters

Antony Lewis and Shaun Hotchkiss are posting/tweeting some preliminary CosmoMC results for Planck + BICEP2. Here’s a brief list of what they’ve put out so far:

Check out Shaun’s Twitter feed for the latest, plus some initial analysis.

[Update: Antony has now posted a PDF with several tables of joint constraints from Planck + BICEP2.]