The dreaded topic of statistics is one that has both confounded and fascinated me as a scientist who enjoyed research and discovery, but tolerated the ‘number-crunching’ part of it as a necessary evil. I have had to come to terms with a subject that I avoided earlier on in my high-school days, i.e. statistics. In fact I still remember absolutely loathing it, but as with so many of life’s ironies, it came back to haunt me, because in scientific research, if you are ‘fair dinkum’ about your work, you need to get acquainted with different statistical tests, what they tell us, and how to translate them into something meaningful.

A prominent and well-respected statistics professor of mine during my PhD candidacy, who was both feared and respected by students and faculty for his candour, bluntness and militant adherence to scientific rigor and discipline in health sciences research, is still one whose example I draw upon when considering the analysis output of any given project that I’m preparing for publication. I will not bore you with statistics-speak in this article, but will try to put into ordinary everyday language what the main statistics mean when we write articles for any audience, whether our scientific peers or the general public.

In medical research, unless you are able to identify every single individual with the condition that you are studying, everywhere in the world, get their consent to join your study, and get every piece of information you need, including information you don’t know you need but suspect you might – then your research is essentially sample-based.

As a researcher, after you have decided what the medical condition is that you wish to study, and what new knowledge is needed, you then need to decide who you will study. You may have a wide range of subsets of the population with the medical condition that you can draw from, and your choice will depend on the research question.

A major compromise of sample-based research is that the individuals you choose to study (collectively your study sample) could have a range of characteristics that are widely different from other samples studied by other researchers, and therefore could generate different results. This variability, which is the differences in measurements across different samples of the same general population, can be the start of a problem that we will call ‘sample error’. The key question then remains, how do I select a study sample that minimizes ‘sample error’?

When a researcher chooses a sample to study, she can use a number of schemes to select them. She can impose any number of restrictions according to gender, ethnicity, age, geographical location etc. But it is important to realize the only purpose in studying a sample is to represent the larger population we are interested in. So we must decide at the start of the research what relationships we want to identify between patient characteristics and the medical condition we’re studying. If our sample is distorted in any way and not representative of the larger population, then our results will likewise be distorted.

We must therefore choose a sample that provides the clearest view of the population we want to study. Random sampling tends to be the least biased selection process. This means applying a scheme that gives each eligible individual the same chance of being selected for the study. However, it is not fool-proof, and even the best random sampling scheme can generate a ‘bad hand’, meaning what we see in the sample is not reflective of the general population. How do we decide whether we have been dealt with a ‘bad hand’?

The p-value answers this question. This is a measure that we see in almost every research effort. Put simply, the p-value gives us the probability that we do not have a sample that is representative of the population. In statistics-speak, it is also known as alpha error or the probability of type I error. The p-value is the probability that the relationship we identified between certain characteristics of the sample population and the medical condition we are studying, was there just through the play of chance. In other words, our study sample misled us.

The threshold that most studies use as the level at which we decide that the finding is significant (statistically) is 5%, i.e. p<0.05 is considered significant. Most analysis approaches will automatically generate this statistic along with other measures of the relationship, which we will deal with in another article. P<0.05 is quite arbitrary and more of a tradition that goes back to the days of Sir Ronald Fisher (1890-1962). However a researcher can and should set this threshold independently and during the design stages taking into consideration the number of statistical tests she intends to carry out, and what she plans to do with her research findings, i.e. apply it to medical practice or use it as a clue to search even further in a larger sample to find confirm these initial findings.

P-values are only one element of what the sample tells us, and by no means the most important. It simply gives us a gauge of how good our study sample is. The interpretation of it is also dependent on whether we followed our study protocol as outlined at the start of the study.

Overall, the estimates our analysis gives us are what we must be careful to interpret in light of what we set out to look for, assuming that we did not mid-way through the study, decide to shift course because of how interesting the data itself looked. In the latter case, our p-value would in actual fact be uninterpretable, riddled with random error, and essentially meaningless. This is where study rigour and discipline is paramount to the validity of the research findings.

We will deal with the other aspects of data analysis that are relevant to research in upcoming articles.