The Last Hurrah! A farewell to the Science Blog, and the perils of data diving…

Greetings all,

It is with a sad note that Roger and I announce this will be our last real science blog for the foreseeable future. We are rather tied up these days with many other projects and have decided to call it a day as many other higher-profile blogs cover the same ground (and probably better so: see the side bar for examples). We have had a good run though, over the last five years, and as most blogs last less than 6 months we have done pretty well and hope we have made a useful contribution. A couple of our blogs actually got developed into published academic papers,  and in any case we certainly had some fun on the way, and so achieved our objectives. Hopefully we have given readers some interesting food for thought with our rants on this and that over the years.

The pursuit of high quality science and critical thinking that drove us to create this blog is still something we both feel strongly about, but on the less positive side, it still seems we still have a long way to go. We both come from newer university backgrounds, but our experiences suggest that the move to mass higher-education (HE) has not had the desired effect of raising scientific and artistic literacy in the population overall, but seems to have created an even more polarized educational environment. Today HE has become more of a commercially focused enterprise and science has become the catch-all phrase for any sort of inquiry that will help market a programme/course/text or product. I would place a bet that the vast majority of new graduates in my discipline have not been taught the common logical fallacies, and that more than 60% of newly qualified health care professionals in practice don’t really understand what P-values represent (see Lies, Damned Lies and Statistics if you have any nagging doubts yourself – you may be surprised). LIkewise in Science Education many newly qualified teachers still don’t seem to understand some fundamental scientific concepts.

Another example of the increasing indifference I see in scientific inquiry came to me recently when I was reviewing a set of grant applications (something I am asked to do from time to time). The ideas for the projects submitted were all excellent, and reflected innovative and practical areas needing investigation that could all provide really useful health/social outcomes. However, the approaches and methods selected to investigate the phenomena chosen seemed to indicate that the researchers were generally unconcerned with the best way to design their studies. Primarily, most researchers in the ones I reviewed couldn’t differentiate whether an inductive or deductive approach would be best to answer their particular question of interest. Several had adopted inductive methods to answer a deductive question (or the converse), some posed a deductive hypothesis and then proposed an inductive study to answer it. Several had included both inductive and deductive investigations in one study, with a mass of statistical tests in a  catch-all “lets find out everything at once” approach.

Designing complex and multi-outcome studies is certainly possible and can produce some excellent scientific work when done well, but it is a tricky business at the best of times, and as anyone who has undertaken scientific work will attest, the more complexity you add in to your study the more likely you will introduce confounding variables and have less certainty in your outcomes. In the worst cases this represents the fundamental research problem of trying to find your hypothesis in your data (also known as data diving, data fishing, data dredging, data snooping, or equation fitting).

Inductive vs. Deductive Research and Hypothesis Development

We have covered inductive thinking  and the problem of induction before on the blog (our prize is still there to be claimed)! However, a brief exploration of why it is useful to differentiate inductive/deductive approaches in scientific thinking, and how they relate to hypothesis development is worth revisiting here.  If we consider that a hypothesis is a proposed explanation of a phenomenon, it should only really come into play when we have established an explanation that we want to test, not when we are still coming up with multiple possible explanations. The following summary of the hypothetic-deductive model of scientific method may help when thinking about categorizing a line of inquiry and questions:

  1. Inductive Inquiry: Observations leads to pattern recognition, which leads to explanatory theory/theories generation, resulting in hypotheses (plural)
  2. Deductive Inquiry: A selected theory (singular) to be tested leads to hypothesis generation, leads to testing/observation, results in confirmation/refutation of theory

They both work together in a cycle of scientific inquiry, but in inductive inquiry you don’t have the hypothesis at the start (otherwise why bother – you would move straight to the deductive/testing part).

The Problems with Data Diving

Data diving/dredging seems to becoming increasingly popular in research. Probably because as we develop ever increasingly large databases and can mine them with powerful computers that can process complex statistical algorithms quickly and efficiently. It involves testing huge numbers of hypotheses about a single data set by exhaustively searching for combinations of variables that might show a correlation. The problem with it is that in proper statistical analysis we test a hypothesis with evidence (data) that was not used in constructing the hypothesis. This is critical because every data set contains some patterns entirely due to chance. In his book Bad Science, Ben Goldacre gives a nice example, in that if you randomly machine gun the side of a barn, and then go and inspect it afterwards, you might start to notice patterns in the holes that would make you think there was some sort of effect there and you could construct a hypothesis to explain it. For example possibly there was a micro-gravitational field to the left top corner of the barn making the bullets cluster there. Of course, there could have been, but every data set contains some patterns that arise entirely due to chance, so if the hypothesis is not tested on a different data set from the same population, it is impossible to determine if the patterns found are chance patterns. Another example would be if you toss a coin five times, with a result of 3 heads and 2 tails, might lead one to hypothesize that the coin favours heads by 3/5 to 2/5. As this hypothesis has been tested on the existing data set, it is confirmed! In reality, of course, the confirmation is meaningless. There are other issues with data diving too, and there is a very good wikipedia article on it here which gives many excellent examples of the main problems.

Farewell for Now…

So, it will be interesting how scientific thinking and practice develops over the next decade or so. Our sense is scientific progress will continue apace and we will eventually see the demise of postmodern nominalist and victimary thinking, as the generation of ever expanding descriptive theory has to be matched by practical outcomes and application. Science is in essence a useful way of understanding the world in that it helps us successfully predict and control events, rather than simply describe them. The volume of data on the internet is expanding at a phenomenal rate. The size of the digital universe, in terms of data being generated, is forecast by IDC to grow to a staggering 44 zettabytes by 2020, and so it will become even more important to be able to discriminate the actual information that represents of practical value to us. New and creative developments in scientific methodology and information technologies will accelerate this progress. However, we are not so sure of the future of the established university down the road in an age where knowledge and expertise are so widely disseminated.

We may of course feel the need to post the odd rant or rave on science (or other associated matters) as the mood takes us in future,  but for now we are signing off. So that’s it from us, no more typo infested and grammatically erroneous  missives for now! We wish all those who have read and contributed to the blog over the years all the best of luck.

Onwards and upwards.


Roger and Bernie

February 2014


Goldacre B.  2008 Bad Science, Fourth Estate, London