Why are there so many tests?

If you need to conduct statistical testing for your thesis or your first paper you might be overwhelmed by the large quantity of statistical tests. Why are so many available? How do I choose and how can I be sure to have selected the right test? And also when reading papers, you might ask yourself, if the test selection in a paper is justified? After all, statistical software will produce an output, even if you run a test inappropriately.

The selection of a test is much easier than one may think. Yes, there are many statistical tests. However, each tests can only be applied under specific conditions. It’s like selecting a screwdriver, there are so many different types, slotted, torx, Philipps, hexagonal or Allen, Triwing, etc.. And then there are different sizes of each type. So which one do you choose? The answer is easy: It depends on your screw.

It’s the same with statistical testing: It depends on your data and the questions that you want to answer.

Key questions

The important questions for the selection of a statistical tests are:

• How many groups do I want to compare?
• What data did I measure (data type)?
• How were data obtained (e.g. are they paired)
• Are my data distributed in a certain way (e.g. normally distributed)?
• Further information may be relevant for some tests

You will be astonished to see that in many cases, for a particular type of data and question there is just one test that fits. So you can not try out different tests until you have a desired result. For all of the tests software will probably produce a result when you enter your data. However, it is your responsibility to make sure the test matches your data and question you want to answer.

Let’s see an example. Then you’ll see, test selection is not really difficult. You find more information on data type, data distribution and other important questions in future videos. For now, just see that test selection is a simple “step-by-step process”:

A simple example

Imagine we want to know if there is a difference in the height of girls and boys in primary school. We have measured the height in 100 children and now we want to select a suitable statistical test. Our data is shown in the figure on the right.

To find a suitable test, you could for example go to webpages such as Wikipedia, where you will find tables with the most common tests and the conditions under which tests can be applied. We’ll do the test selection here in the software GSS because there we simply follow a flowchart (see figure below).

Step 1: The first question is “how many groups do we have?” We have girls and boys, two groups.

Step 2: The next question is about paired data. We’ll discuss paired data in more detail in another post (for those who already want to know more, pairing is about whether data is related to each other, for example when you observe the same patient before and after treatment, then these observations are paired). Here we don’t have paired data.

Step 3: The next question is about the distribution of the data. Why is the distribution of data important? There are some tests which are built on the assumption of normally distributed data and these test are very precise in detecting differences. These tests are often referred to as “parameteric test”. But if you don’t know how your data is distributed (or if it is not normally distributed) choose “nonnormal data”. Then the choice of tests will be restricted to so called “non-parametric tests”. These are usually only slightly less sensitive compared to tests assuming normal distribution. For now, we haven’t looked at data distribution to keep things simple. So let’s try to be on the safe side and choose nonnormal data.

Step 4: Now we come to a very important question, what kind of data have we measured? This is important because each test has been designed for a particular data type. There are “metric data” such as 3.7 cm, 8.0 kg or -14.3°C. Then there are “Nominal data”. These are classes which have a name but no particular order, such as smoker & non-smoker or green, red, blue.  Subjects either belongs to one class or to another. Either I’m a smoker or I am not. And then there are “ordinal data”. Also these are classes with a name, but they do have an order, such as “small”, “medium”, “large”. In our example, we measured the height of children, so we have metric data.

Step 5: Now we come to the last question in this flowchart: Is the distribution of data similar for girls and boys? This question is not relevant for all tests, it is just a specific question for this part of the flowchart. We could have a look at the distributions, but we wouldn’t actually expect that the distribution of height in girls and boys would have a very different shape. We would not expect that that variability of height is different in boys or girls. Otherwise it would be rather funny, imagine all girls of a given age had always almost the same height, but the height of boys varied a lot from small to large. So let’s assume similarly distributed data for now. Then we find, that Mann-Whitney-U-test is the test we should use. I.e. for our question and data the appropriate test is Mann-Whitney-U-test.

If we were unsure about the similarity of data distribution in girls and boys and wanted to be on the safe side, we could also have chosen Mood’s median test. If we had looked at the distribution of the data and had found that they were indeed normally distributed, we would have moved up in the flowchart towards parameteric tests, such as two-sample-t-test.

Conclusion

In summary, each test is linked to specific conditions. Test choice is a simple step-by-step process. Whenever you are unsure if a condition is fulfilled, then better choose a test without the respective assumptions. As a result you may then apply a slightly less sensitive test, but at least the test result will not be incorrect.