Thinking with a Hammer: October 2020

I first began to suspect that there might be something wrong with the culture of modern science in the mid-1980s, when I taught at the University of Jyväskylä in Finland and sought to augment my lowly junior lecturer’s salary by editing the English language versions of scientific papers being submitted for publication by my Finnish colleagues. As most Finns speak excellent English, this task largely consisted in tidying up the grammar a little, mostly by cutting out a number of commonplace ‘Finnishisms’ which arise in Finnish English as a result of the structural differences between the two languages, with Finnish, for instance, using case endings instead of prepositions, having only one word for the third person singular instead of three, and being entirely without a verb ‘to have’. In order to satisfy myself that any given sentence or paragraph made sense, however – and, just as importantly, actually expressed what I thought the author intended – I nevertheless found that I usually had to spend quite a lot of time learning the language of the science, itself, which primarily consists in the nouns and verbs which denote a science’s objects and the ways in which these objects relate to or act upon each other.

Given how much time I usually had to spend on this background research, I was always especially pleased, therefore, whenever I received repeat business within the same scientific field, in that while I could still charge my regular fee, I didn’t have to learn a whole new scientific language from scratch. It was when dealing with one particular repeat customer, however – an ophthalmologist – that I began to notice something odd about the way in which he set about communicating the results of his work to the rest of the world. For the empirical study upon which he based the second paper he brought me appeared to be exactly the same as the study he had used for the first paper I had edited for him. In fact, for a while, it even made me wonder whether I’d picked up the wrong document, especially when I found myself correcting sentences I was fairly sure I’d corrected before.

For a while, indeed, it even made me feel slightly annoyed. For once I’d checked that the second document was, in fact, a different paper – in that it had a different title, at least – I felt that the minimum the author could have done was incorporate my previous corrections into the new work. By not doing so, it was almost as if he didn’t care about the improvements I’d made to the previous paper’s presentation, in which I took some pride, trivial though my contribution may have been to the work as a whole. Because so much of the second paper was more or less identical to the first, however, I refrained from saying anything, happy to hand my client a substantial invoice for what had essentially been a few hours work copying my previous corrections from one document into the other.

Then he brought me a third paper and, lo and behold, it was based on exactly the same empirical study as the previous two. The methodology was the same, the test subjects were the same, and so were the results as set out in the various tables. The only differences between the three papers I could discern were a number of additional findings and conclusions set out towards the end of each one, which, in my naivety, I thought could have been presented better had they all been in the same paper: something I was foolish enough to suggest to the author when he came to pick up my final set of revisions.

It was one of those rare moments in life when the scales fall from one’s eyes and one suddenly sees the world for what it truly is rather than what one had childishly supposed it to be. For instead of thanking me for such an insightful suggestion, my client simply looked at me as if I were a complete idiot. And then the penny dropped. For these ophthalmological papers, over which I had diligently laboured for many more evenings than I now suspected they were worth, were not about imparting new information to fellow scientists working in the field. Even less were they about adding to the total sum of human knowledge. They were simply about adding three more titles to the list of publications on the author’s curriculum vitae in order to advance his career, either by helping him to secure further funding for his research and by enabling him to obtain a better position within the university system: two measures of career success which were already connected back in the 1980s, but are now almost inseparable.

I say this because most university graduates today leave university with a far greater level of debt than they did forty years ago, such that, in most cases, they cannot even contemplate post-graduate research unless their tuition fees are paid and some level of maintenance support provided. Because most western countries regard their science base as essential to their economic prosperity – and therefore require a large pool of science graduates to undertake post-graduate training – most countries, especially in Europe, have consequently developed public funding regimes for both scientific research and post-graduate support which basically tie the two together, such that if a student wins a place on a publicly funded research programme, their own funding automatically comes with it, their eligibility for funding actually being determined by their selection for the programme.

What this also means, however, is that if a university science department wishes to run a post-graduate programme, as most university science departments do – such programmes figuring prominently in all their marketing material – then funding for some area of research must first be secured, with the further consequence that those members of the department who are particularly good at this tend to be more highly regarded and enjoy a higher status than those who aren’t. What’s more, this then affects recruitment. For in recruiting members of a science department, particularly the department head, most universities tend to place far more emphasis on a candidate’s track record in obtaining research funding than on their teaching ability, thus making this the key attribute in building a successful academic career.

For universities and scientists alike, however, this whole system of funding has put them on something of a funding treadmill. For if universities want to continue their prestigious research and post-graduate programmes, and if scientists wish to maintain their high status positions, then they have to continually win more funding, constantly going back to the relevant funding agencies in a never ending cycle of application and political lobbying which can – and often does – have the effect of inverting the relationship between funding and research, such that instead of obtaining funding in order to continue their research, scientists now too often find themselves in the position of undertaking research in order to continue obtaining their funding, which, in turn, then places them on another treadmill: that of having to continually produce results – in the form of a never-ending stream of published papers – in order to justify all the money they have been given and thus be given more.

Quite predictably, this has also brought about a massive expansion in the scientific publishing industry. For in order to accommodate the demands of so many scientists who have to get their results published in order to justify their funding, dozens of new scientific journals are founded every year. Today, indeed, it is estimated that there are more than 30,000 such journals worldwide, which collectively published over 2.2 million scientific papers in 2018, the latest year for which I have figures, most of which, statistically, can only have achieved a very limited readership. Indeed, there is a widespread joke within the scientific community – indicative of an equally widespread cynicism – that the vast majority of all published papers only ever have three readers: the authors, themselves, the editor of the publishing journal, and the referee appointed to conduct peer review.

Amusing as this may be, what is far less amusing, however, is the ever-growing cost of this publishing juggernaut, which, like the cost of scientific research, itself, is either met by students – in the form of tuition fees – or by taxpayers – either through grants to universities or through research funding – these two sources of finance roughly corresponding to the two principal ways in which scientific journals make their money.

The first and more traditional of these is by charging annual subscriptions, both for online access to individual papers and for the annual bound editions which still accumulate on many university library shelves. Given the level of growth in the industry, however, it was inevitable that at some point the cost to universities of further expansion based on this model would simply have become prohibitive. Most new journals – those which have been founded during the last couple of decades or so – have therefore adopted an Open Access (OA) model, in which the published papers are free to read online, with the costs being covered by the authors. The idea that this makes them ‘free’, however, is something of a wilful misconception. For knowing that they are going to have pay to have their research published in this way, most scientists now include the cost of publishing in their applications for research funding, with the result that the costs are simply charged to the public purse.

Indeed, it is the fact that no one who cannot pass the cost of these journals on is ever directly charged for them that permits their seemingly infinite expansion. For if the only payments journals received came from those who actually consumed their contents, and if these consumers only paid for what they actually consumed – whether this be in the form of single online papers or the quarterly paperback editions to which I, myself, used to subscribe – then not only would there be far less money flowing into the scientific publishing industry but, as a consequence, there would also have to be far fewer journals, which, in turn, would mean that far fewer papers were published.

The problems to which this lack of any commercial restraint give rise, however, are not confined merely to this ever-increasing drain on public funds. For by providing what is, in effect, a virtually limitless publishing capacity, publishers have not only ceased to perform the filtering function they once provided – ensuring, instead, that just about every scientist can now get their work published, no matter how mediocre or entirely meritless it may be – but have also largely abdicated their responsibility for maintaining standards, thereby allowing scientists to increasingly get away with some fairly unscientific practices.

Probably the most widespread of these is something called p-hacking, which is the selective reporting of results in order to support a theory which the raw data as a whole would not support. Another common form of data manipulation is something called HARKing, or Hypothesizing After the Results are Known, which may sound fairly innocuous, but which actually means that the hypothesis in question, coming the at end of the process rather than constituting its starting point, is never really subjected to any serious scrutiny.

What both of these practices thus do is allow scientists to publish results which they wouldn’t otherwise be able to obtain, thereby making the incentive for going down this road fairly obvious. For if one needs to publish something within a particular time frame in order to maintain one’s funding, but hasn’t yet discovered anything of note in one’s current line of research, then the temptation to resort to some sort of statistical sleight of hand may well be overwhelming, especially given the fact that, because both of these practices involve some fairly complex statistical techniques, they can also be fairly hard to detect without detailed analysis of both the data and methodology employed.

Indeed, the sophistication of some of these techniques, along with the fact that their application – at the desk-top level – has only recently been made possible by increases in personal computing power, has led some commentators to speculate that, in some cases, the abuse of these techniques may well have been unwitting, the suggestion being that in exploring the potential of new analytical tools, some scientists may have crossed the line inadvertently. And, in some cases, it’s perfectly possible that this is how these practices began. It is also perfectly possible, however, that having discovered these easy and convenient ways of obtaining results, many scientists then talked themselves into believing that while technically unscientific and invalid, they weren’t really doing any harm, especially as no one was likely to read the resulting paper anyway.

The problem, of course, is that once one has created an environment in which the need for scientific integrity is no longer felt to be absolute, it opens the door to other, far more obviously intentional and less ‘innocent’ forms of abuse, the simplest and most blatant of which is that to which Charles Rotter, editor of the journal ‘Molecular Brain’, drew the public’s attention earlier this year, in an article in his own journal in which he describes how, of the 180 papers submitted to the journal during the previous two years, he had to recommend that 41 of them be ‘revised before review’, his principal request being that the authors submit their raw data for appraisal. He then reports that:

‘among those 41 manuscripts, 21 were withdrawn without providing raw data, indicating that requiring raw data drove away more than half of the manuscripts. I rejected 19 out of the remaining 20 manuscripts because of insufficient raw data. Thus, more than 97% of the 41 manuscripts did not present the raw data supporting their results when requested by an editor, suggesting a possibility that the raw data did not exist from the beginning, at least in some portion of these cases.’

In that 40 withdrawn or rejected manuscripts out of 180 represents 22% of all the manuscripts submitted to ‘Molecular Brain’ during this two year period, should this be happening at all of the 30,000 scientific journals currently published around the world, this would mean that up to 620,000 papers could be being submitted to and rejected by journals each year on the grounds that they are entirely spurious. The suspicion, however, is that they are not all rejected. For not only is it to be doubted whether all journal editors are quite as scrupulous as Charles Rotter, it is also hard to believe that so many scientists would attempt this kind of scam unless they thought they could get away with it, which suggests that, at least some of the time, they do.

The question, of course, is how widespread these various types of scientific fraud are. And the answer, unfortunately, is that it is almost impossible to tell. For if all attempts at fraud were known, then, presumably, they would all be stopped. One clear indication of their increase, however, is the growing problem of scientific irreproducibility, wherein other scientists are not able to reproduce the results reported in a scientific paper following the methodology laid out in the paper itself: the clearest possible sign that there is something wrong with the underlying science.

Again, it is difficult to estimate the overall extent of the problem in that it varies from field to field. What should not come as a surprise, however – but is shocking nevertheless – are the fields in which it would appear to be most prevalent. For they are not the fields which common prejudice would lead one to expect – such as those in the social sciences, for instance – but rather those which are not only the most competitive, being awash with money, but in which we naively expect a higher level of integrity to be maintained, the most notable being cancer research. Yet according to one meta-analysis by F. Prinz et al, published in 2011, only around 20% to 25% of published studies in cancer research could be validated or reproduced, while another analysis by Begley and Ellis in 2012 put the figure as low as 11%.

If you find these figures as shocking as I did when I first stumbled upon them, then, like me, you will also probably be asking how such a systemic failure to maintain scientific integrity is possible. For surely there has got to be someone who checks the validity of a scientific paper before it is published: if not the journal’s editor, then whoever is appointed to conduct peer review. What you have to remember, however, is not just that referees are traditionally unpaid – their lack of reward or inducement supposedly ensuring their impartiality – but that the current system of peer review was instituted at a time when science was still largely a gentlemanly pursuit, when the only form of income universities brought in came from teaching, and when scientists largely conducted their research in between the lectures and tutorials for which they received their salaries. By not receiving any remuneration for their research as such, not only were they thus under no pressure to publish their results until they were ready – or, indeed, until they had something worthwhile publishing – but they also had absolutely no reason to commit fraud, which, in turn, meant that their colleagues were quite happy to act as unpaid referees, secure in the knowledge that all this would actually entail was a hopefully enlightening read and a quick check for obvious errors that might otherwise embarrass the author and journal alike.

Of course, now that the situation has changed, publishers could start paying their reviewers and demanding a more rigorous appraisal from them. Not only would this lead to the rejection of more papers, however – for which the publishers would still have to pay the reviewer’s fees – but even if reviewers only spent an extra couple days going through an author’s data and methodology, the additional cost could be the final straw for those who have to foot the bill, thereby pushing the whole industry over the edge.

More to the point, a lack of critical depth and rigour in the reviews carried out by referees is not the only problem with the current system of peer review. There are also problems of both scale and anonymity. For 2.2 million scientific papers published every year do not just require 2.2 million referees, but 2.2 million suitably qualified and impartial referees, who, in principle, should not know the authors of the papers they are reviewing: something which it is hard enough to achieve even in the most heavily populated areas of science, but is made even more difficult by the fact that, in order to stand out in any given field, most scientists today quite naturally gravitate towards some sort of niche specialism, in which they can make a name for themselves as one of only a handful of experts. What this also means, however, is that, sometimes, it can actually be difficult for a publisher to find a suitably qualified referee at all, let alone one who does not know the author of the paper to be reviewed. For niche specialisms create niche worlds, in which the participants regularly attend and speak at the same conferences, making anonymity almost impossible.

Worse still, publishers are in the business of publishing. If within a certain niche specialism there are rivals and competitors, the last thing they want to do, therefore, is send out a paper to be reviewed by a referee they know is going to be hostile. The inevitable result is that networks of like-minded scientists are formed, who are known to be sympathetic to each other’s ideas and who, through the intermediation of a common publisher, regularly review each other’s papers, bringing the whole concept of impartial and independent peer review into question.

What makes this whole system even more insidious, however, is the fact that, even when they are acting corruptly, it is extremely unlikely that those doing so actually see themselves as corrupt. Whether they are manipulating data in order to get the results they want or rubber-stamping the paper of a colleague whose views they share, the likelihood is that they simply see it as normal: as the way in which science is conducted these days. For this is not the corruption of individual scientists; it is the corruption of the culture of science as a whole, which makes it all the more difficult to reform. For if wrong-doers do not believe that they are doing anything wrong, it is very hard to get them to change their ways, especially if changing their ways would be to their detriment and might eventually involve them in trying to amend the ways of others, thereby earning for themselves a reputation for being trouble-makers and further disadvantaging their careers.

Indeed, once corruption of this type has taken hold of an institution, it is almost impossible to eradicate it, especially when the institution in question is almost completely opaque to outsiders: a condition which, in itself, tends to foster corruption. For having only a limited understanding of how the institution works, those looking in from the outside are not only rendered more or less incapable of imposing reform from without, but can be far more easily manipulated into giving the institution their support. What’s more, due to this general level of ignorance, any support given, will be largely under the control of the institution itself. For in order to determine how this support should be allocated, those providing it have very little choice but to co-opt or seek the advice of institution members, thereby opening the door to even greater corruption.

Not that the real-world relationship between science and government is actually quite this one-sided. For as in any relationship of patronage, the patron always has a great deal of leverage over the patronised, especially where the patronage is largely financial in nature and where those receiving it are entirely dependent upon it, as is the case with respect to nearly all non-commercial science throughout the west. While scientists may still have a lot influence, especially in advising government as to where their money should be spent, the need to keep the funding flowing creates a far more symbiotic relationship, in which scientists are not only obliged to subordinate many of their own scientifically prompted aims to the government’s more strategic agenda, but have frequently shown an enthusiastic willingness to be of service to government in ways that are not always particularly good for them and which have further corrupting effects.

One of the most pernicious of these has been the tendency, in recent years, for scientists to push science beyond its traditional role of merely explaining how the universe works and to turn it, instead, into a tool for predicting the future. This they have done through the almost ubiquitous application of computer models or simulations, the increased influence of which, especially in the development of government policy, has been allowed to go unchecked largely because neither governments nor the public, nor most scientists, themselves, fully understand either the proper place of such models or the potentially catastrophic effects of their misuse.

To understand these dangers, however, one must first understand what computer simulation actually are and how their application differs from the application of standard scientific method. And one of the best ways I have found of explaining this difference is to start by comparing what might be called the relative ‘directionality’ of the two approaches. For all methodologically rooted disciplines follow what is largely a step by step process which has, as a consequence, a directional flow. In the case of standard scientific method, this process starts with observations and measurements which are then analysed to reveal patterns or anomalies. Hypotheses are then formulated to explain these patterns or anomalies and experiments designed to test the hypotheses. Those hypotheses which do not fall at the first hurdle but need some modification are then refined on an iterative basis, with further experiments designed to test each refinement, until a stable theory is finally reached.

The process of building a computer model, however, is very different. In fact, it flows in completely the opposite direction, in that it actually starts with a theory. It then turns this theory into a set of algorithms, which it then uses to predict future observations and measurements under different conditions. Modification are then iteratively made to the algorithms to improve predictive accuracy.

This reversal of directionality is absolutely crucial. For it means that the soundness of any computer model largely depends on the soundness of the underlying theory, the old adage of ‘rubbish in, rubbish out’ saying it all. As a result, the most reliable computer models are usually to be found in areas of applied science in which the underlying theories and principles have long been established. Good examples include fields like fluid dynamics, of which I have some personal if second-hand knowledge, having once had a friend who gained his Ph.D. in mathematics modelling turbulence in gas pipes. Then, of course, there are the long-standing applications of computer modelling in both engineering and construction, where it is an accepted principle that it is far better to run a computer simulation to find out whether a building, built to a certain design, will withstand an earthquake of a specified magnitude, than it is to actually build it and find out the hard way.

Even here, however, one must sound a note of caution. For even when basing one’s computer model on a theory of long standing, one never knows when exceptions may be discovered. This is because scientific theories are essentially constructs of the imagination designed to explain the inner workings of the observable universe and are not, themselves, observable. What this means, therefore, as Sir Karl Popper explained in ‘The Logic of Scientific Discovery’, is that no scientific theory can ever actually be proven. For no matter how well established a theory may be, there is always the possibility that someone, someday, will discover some new piece of evidence inconsistent with its possible truth, thereby proving it false. Indeed, as Popper also famously argued, the possibility of falsification, or falsifiability, is actually a condition of a theory being scientific, in that if it cannot be falsified or one cannot say what piece of empirical evidence would prove it false, then it is simply not a scientific theory.

What’s more, history shows us that even theories which have been held true for centuries and which have had, in their time, a high level of predictive accuracy, can eventually be proven false. A prime example of this is Sir Isaac Newton’s theory of gravity, in which gravity was conceived as an attractive force pulling bodies together, a bit like magnetism. Indeed, many people still think of it in this way today. More to the point, the mathematics based on this way of conceiving of gravity accurately predicted most of the observable universe for more than two hundred years. In fact, the only observation it did not correctly predict was the orbit of the planet Mercury, which remained an unexplained anomaly until the beginning of the 20^th century, when Albert Einstein produced a new mathematical formula based on a completely different concept: one in which gravity was now conceived, not as an attractive force, but as a warping of space due to the mass of the bodies within it, the calculated effects of which – based on relative mass and proximity – far more accurately approximate the orbit of the tiny planet Mercury around its massively larger star.

That the mathematics derived from Newton’s concept can still be used to accurately predict the movements of the other planets has, of course, led some people to mistakenly assume that, in these unexceptional cases, Newton’s theory still holds true. However, this is clearly a misconception. For either Newton’s theory is correct in every instance, or Einstein’s is. And since Newton’s theory has been found to be false in at least one instance, the betting is on Einstein, though even here, of course, our acceptance of Einstein’s theory is still only provisional. For, in time, it too may be proven false. For such is the nature of science.

To avoid this kind of confusion, however, it may be helpful to consider yet another, even clearer example of a long-term theory being overturned and replaced by a new theory, the old theory, in this case, being one which no one, today, would mistakenly believe was still true. For the theory I have in mind is that which was the foundation of what is now generally known as phlogistic chemistry, which held that those forms of matter which lose mass when heated do so because they give off a substance called phlogiston.

Because, today, we now know – or think we know – that no such substance exists, this, of course, seems laughable. But phlogistic chemistry was actually very successful in its time and lasted for more than three hundred years, which is significantly longer than modern chemistry has so far survived. Indeed, the only thing that phlogistic chemistry could not explain was why certain forms of matter, i.e. metals, gained mass when heated, which we now know – or think we know – is the result of oxidation, making it somewhat inevitable, therefore, that it was the discovery of oxygen, by the French chemist Antoine Lavoisier, which eventually brought phlogistic chemistry to an end.

Or, at least, this is the account given in most potted histories of science. It is, however, a total misrepresentation of what actually happened. For in that the gas we now know as oxygen was first isolated by the English scientist Joseph Priestley, it is questionable whether Lavoisier can be said to have discovered it at all, especially as he was actually shown how to isolate the gas by Priestley, himself, at a gathering of French scientists in Paris in 1774. It was just that Priestley did not call it ‘oxygen’. He called it ‘dephlogisticated air’ and continued to practice the science of phlogistic chemistry for decades after Lavoisier coined the new name.

So why is Lavoisier credited with the discovery, and what did he actually do other than give Priestley’s dephlogisticated air a new name which ultimately proved to be just as mistaken? I say this because the word ‘oxygen’ is derived from the Greek words oxys, meaning ‘sharp’, and genes, meaning ‘to create’, and was chosen by Lavoisier because he thought the gas had an important role in the formation of acids, which turned out to be false. In fact, had Lavoisier ended his study of ‘oxygen’ at this point, it is likely that his name would have gone down as a little more than a footnote in history and that we’d have ended up calling the second most abundant constituent of the earth’s atmosphere something else entirely. It was while he was studying some of the other properties of this incorrectly named gas, however, especially its role in combustion and the roasting of metals to produce powders or calxes, that he had his ‘eureka’ moment. For noting – as others had done before him – that the powders which resulted from the calcination process were heavier than the metals with which the process started, and reasoning that it could not be the application of heat alone which caused this weight gain, since ‘heat’, itself, had no mass, he hypothesised that the calcined metals had to be drawing something else out of the atmosphere, with Priestley’s gas, now his own ‘oxygen’, being the prime contender: a hypothesis which was then given even more traction when he learned how to isolate yet another new gas, one which was isolated for the first time in 1776 by yet another English scientist, Henry Cavendish. For in studying the properties of this second new gas, he discovered, quite remarkably, that, on combustion, small amounts of water were produced, suggesting that, as in the case of the calcined metals, combustion actually caused this second gas – which he duly called ‘hydrogen’, or ‘creator of water’ – to combine with something else, with oxygen, again, being the most likely suspect.

Not, of course, that he had any way of proving this. For I repeat once again that scientific theories cannot be proven. All one can do is accumulate supportive evidence and disprove alternative hypotheses, which, indeed, is what Lavoisier spent the next two decades doing, right up to the day of his execution on the guillotine in 1794, which is surely one of the most shocking and unjust ends to befall one of the world’s greatest scientists. For while he may not have discovered either of the two gases to which he gave names, what he did was something far more fundamental and far-reaching. For he gave the world the first two building blocks of a whole new chemistry: one in which the entire material universe would come to be seen as assembled out of a finite number of elemental constituents, chemically bonded in different combinations to produce different substances. And it was this that was his real achievement. Not the discovery of any particular gas. For, in the strictest sense, he never discovered anything at all. What he did, was create a whole new way of conceiving of the material world – much like Einstein – altering our conceptual framework in a way that also has implications for the nature of science itself. For instead of progressing in the manner of a steady, incremental accumulation of knowledge – which is how science is so often represented, especially by scientists themselves – it would appear from the examples of both Einstein and Lavoisier that, occasionally, a science will completely renew itself, throwing away the old and starting again on the basis of whole new paradigm.

Even more astonishingly, historical evidence would suggest that such paradigm shifts, as Thomas Kuhn called them, are far more commonplace than one might imagine, their periodic occurrence being made almost inevitable by the unprovable nature of scientific theories combined with our own stubborn reluctance to abandon established theories – no matter how full of holes they may be – until someone has come up with something better: the inevitable result being that, within any given field of science, problems tend to build up until, eventually, the damn bursts.

In his seminal work, ‘The Structure of Scientific Revolutions’, Kuhn, in fact, describes how this usually comes about, outlining the typical lifecycle of a scientific theory from its inception, through its maturity, to its eventual demise. Unsurprisingly, he describes how the introduction of a new theory is almost invariably met by fierce resistance from the existing scientific establishment, which, by dint of simply being established, usually has a lifetime of investment in the previous theory. Indeed, we have already seen an instance of this in the case of Joseph Priestly and other diehard phlogistic chemists, who resisted Lavoisier’s new-fangled ideas long after it was reasonable to do so. What this also means is that the early-adopters of any new theory tend to be younger scientists who have yet to make a reputation for themselves, have nothing to lose, and are excited by the prospect that, by adopting these revolutionary new ideas, they may be the ones to finally solve the many outstanding problems which their science has accumulated over the years.

As take up of the new paradigm increases, however, it eventually begins to raise as many questions as it answers. For reality is invariably richer and more complicated than we initially conceive it to be, such that the more we study it, the more questions it poses. And while some of these questions may lead to new discoveries – thereby raising the level of excitement still further – a body of trickier, more stubborn questions inevitably starts to accumulate, creating problems which can begin to seem almost as intractable as those which beset the old paradigm. In fact, it is not unusual for a new paradigm to even reinstate problems which the old paradigm had actually solved. With no new theory as yet on the horizon, however, a whole new generation of scientists now finds itself in much the same position as their predecessors, having to conjure up additional, supplementary or subordinate theories in order to explain the anomalies and exceptions for which the main theory cannot account.

The problem with this, however, is that every supplementary theory that is needed to bolster the main theory effectively weakens the paradigm, which, in most cases, will have been conceived and initially embraced because it promised to make everything simpler. Now, the whole thing has become a complete mess, with ad hoc fixes all over the place, making the entire science ripe for someone to finally come along and say: ‘You know what, we’ve been looking at this in completely the wrong way. Instead of thinking of it like this, we should think of it like this.’ And thus a new paradigm is born. And the whole cycle starts all over again.

Needless to say, most scientists don’t much like this model of how science works. They much prefer the paradigm in which science is seen as a steady accumulation of knowledge, to which each contribution is equally valid and of equal value. For what is most disturbing about Thomas Kuhn’s revolutionary vision to most scientists is not just the idea that any scientist could wake up tomorrow morning to find their whole life’s work invalidated – rendering them as ridiculous and irrelevant as Joseph Priestly – but that science might actually demand of them something more than a mere journeyman’s contribution to a collective effort: that it might, indeed, demand of some – those who are to be deemed great – an individual leap of genius of the kind Immanuel Kant described in the ‘Critique of Judgement’, where ‘genius’ is defined not as mere cleverness – however extraordinary such cleverness may be – but as the possession of precisely this rare ability to get others to see the world in a new and different way, whether this be in the visual arts, philosophy of the type which Kant himself wrote, or, indeed, science.

In defense of their preferred paradigm, therefore, most scientists will almost certainly argue that, while such revolutionary paradigm shifts may have happened in the past – the historical evidence for their occurrence, from Lavoisier to Einstein, being undeniable – because science proceeds by eliminating false theories, it follows that their frequency will naturally decline over time – as more and more false theories are removed – until, eventually, they cease altogether, a point we may already have reached.

This argument, however, is based on a belief in what is known as convergence – between our scientific or theoretical conception of the universe and the reality of the universe as it exists in itself – and contains two main flaws. The first is the assumption that, as we replace old, falsified theories with new theories, these new theories will necessarily be ‘true’ in the sense of corresponding to reality. Because all scientific theories are constructs of the imagination, however, there is actually no good reason to believe this, in that we could simply go on continually replacing one false theory with another, which, in time, also turns out to be false.

The second flaw in the argument, however, is even more significant. For even if the above assumption were correct and we gradually replaced all false theories with ‘true’ ones, such that eventually we ended up with a perfect correspondence between our theoretical conception of the universe and the reality of the universe as it exists in itself, we could never know this was the case. For in that it is only through our theoretical conception of the universe that we are able to apprehend it, the one thing we cannot do is step outside of ourselves and compare that theoretical conception with the ‘real’ thing to see whether they correspond. Indeed, the only indication we could have that correspondence had been reached would be if scientists suddenly ran out of questions to ask. And even then we wouldn’t know whether we’d reached correspondence, or whether we’d simply arrived at a set of theories which were perfectly consistent with all the empirical evidence.

To those unfamiliar with these concepts, this distinction may, of course, seem somewhat strained. If a theory is consistent with all the empirical evidence, how could it not correspond to reality? As an exercise in clarification, therefore, ask yourself whether it is possible for two competing scientific theories to both be fully consistent with all the currently available empirical evidence. Assuming that you answer this question in the affirmative, now ask yourself whether it’s possible for both of these theories to be true in the sense of correspondence. If the answer to this question is ‘No’, then this means that at least one of the two theories, both of which are fully consistent with all the empirical evidence, cannot logically correspond to reality. And if it’s possible for one of the theories to be fully consistent with all the empirical evidence and still not correspond to reality, then it follows that it is possible for both theories to fall into this category. Indeed, it’s possible for all our scientific theories to be fully consistent with all the empirical evidence and yet for none of them to correspond to the way the universe actually is in itself.

What this teaches us, however, is not that science is somehow defective, but rather the importance, not just of knowing which criterion of ‘truth’ – ‘consistency’ or ‘correspondence’ – is applicable in any particular context, but of ensuring that where only the weaker criterion of consistency applies, as in the case of a scientific theory, the scientific theory in question is properly grounded in those basic elements of science, namely observation and measurement, to which the stronger criterion of correspondence is applicable. For while we may never be able know whether our scientific theories correspond to reality, we can certainly find out whether our measurements do. And while this may be stating the obvious, sometimes the obvious needs to be stated: that the integrity of scientific theories depends on them being grounded in empirical evidence. For it is this that prevents them from simply coming adrift and floating away on flights of imaginative fancy, which is precisely what can happen in the case of computer simulations.

This is because, in building such simulations, scientists have a tendency to make three fundamental errors, two of which we have more or less already covered. The first of these is the mistaken belief that, if a theory accurately predicts future events then it must be true, even though there have been numerous scientific theories throughout history with a high degree of predictive accuracy which have ultimately turned out to be false. The second mistake is then to take the ‘truth’ of these theories to mean ‘correspondence’, which then opens the door to two further conceptual errors. For not only does the belief in a theory’s correspondence to reality remove any threat that the theory might one day be overturned in yet another scientific revolution – thereby reducing the level of caution with which scientists might otherwise regard the reliability of any computer model based on it – but it actually elevates the theory to the status of something absolute and immutable, thereby reducing the perceived need to validate it empirically.

Indeed, we see this in the way in which many computer models are developed, in a continuous process which starts by running the model against historical data and adjusting the algorithm until its output approximately matches the data record. It’s a bit like running ‘Goal Seek’ in Excel. You know what answers you want; so you just keep tweaking the computational engine until you get them. The trouble with this, however, is that it also effectively refines the theoretical construct upon which the algorithm is based. And while every effort is usually made to empirically validate these changes to the underlying theory, such that they are not just random and have some basis in the real world, the verification process – the matching of the model’s output to real world data – takes precedence, such that once data congruence is achieved, the validity of the model, as an accurate representation of the world, is seen as less important. Indeed, the mere fact that the output now corresponds to the historical record is taken as evidence that the modified model is correct.

This tendency to implicitly favour verification over validation is further emphasised in large multiscale simulations, where the outputs from smaller, lower level models are fed into a larger, higher level models as mathematical parameters, which are regularly tweaked without validation in order to ensure that the outputs from the top level model correspond to the real world. This process, known as parameterization, is especially prevalent is cases where the lower levels systems being modelled are inherently chaotic and are only statistically determinate at the macro level, thereby seeming to justify the lack of validation.

Nor are these the only ways in which computer simulations tend towards artifice rather than real world representation. For in a manner very similar to Thomas Kuhn’s description of how supplementary, ad hoc theories are used to explain exceptions to a main theory during the mature phase of scientific paradigm’s lifecycle, so too supplementary, ad hoc programs are often written into computer models simply in order to ‘make the model work’, whether or not these supplementary programs have any real world correlation. The result, as described by Eric Winsberg in his essay ‘Computer Simulations In Science’, is that the outputs from computer models:

typically depend not just on theory but on many other model ingredients and resources as well, including parameterizations (discussed above), numerical solution methods, mathematical tricks, approximations and idealizations, outright fictions, ad hoc assumptions… and perhaps most importantly, the blood, sweat, and tears of much trial and error.

Indeed, some large, multiscale computer simulations take years – even decades – to develop, and are less built than grown organically, which can give rise to its own set of problems, especially in a university setting where numerous generations of post-graduate students may have worked on the simulation, each making their own modifications without always adequately documenting them, such that eventually it becomes very difficult to say how the model actually works, or even what it represents, rendering the idea that it could somehow correspond to reality not just laughable but conceptually confused.

Indeed, we have seen an example of this very recently in the case of the epidemiological computer model developed by Professor Neil Ferguson and his team at Imperial College London, which predicted that over half a million people in the UK could die of Covid-19, thereby forcing the government to impose draconian lockdown measures. Not only has this prediction turned out to have been massively awry, however, but it has since been discovered that the sixteen year old code from which it was generated has been so frequently altered during its lifetime that large parts of it are more or less unintelligible to any computer programmer trying to work out what they do. In short, the whole program has effectively become a ‘black box’: a mystical engine for issuing predictions, some of which turn out to be correct, though nobody knows how or why.

This is not just bad science; it is no longer science at all. It’s more like the Oracle at Delphi, which, of course, is exactly what governments really want from science: not the pure and disinterested kind of science which merely seeks to describe and explain how the universe works, but a shaman’s tool which they can use to take the uncertainty and political exposure out of decision-making. For this, of course, is how it has always been. In the past, rulers consulted priests who read the auguries for them. Now they consult their scientific advisers: a modern alternative which has the added benefit of allowing them to say that they are simply following scientific advice, thus absolving them of any responsibility for anything that might go wrong should the auguries turn out to be false.

Thus, while it is likely that a number of individual scientists will eventually be made scapegoats for the catastrophic economic consequences which will undoubtedly follow from the mishandling of the coronavirus pandemic by governments around the world, the biggest casualty resulting from this folly is almost certain to be science itself. For while science, in the abstract, may not be responsible for the fact that so many of those who practice it don’t really understand it and – not knowing what they do – are therefore willing to mangle and corrupt it for their own self-advancement, it is doubtful whether those whose lives are devastated by its misuse will make such a fine distinction. Seeing only scientists to blame, they will likely blame science too, thus almost certainly undermining one of the most important pillar upon which our civilization has been built, much to the detriment of us all.

Thinking with a Hammer

Sunday 18 October 2020

Problems in the Culture of Modern Science