Understanding data type
When you do statistical analyses, the knowledge about data type is very important, because many statistical methods and tests were specifically designed for a specific data type. However, there is a lot of confusion regarding data type, because many different terms are used and when you look into statistical textbooks, you will likely find different terms for the same type of data. Here is an overview:
- Metric variables:
- Continuous variables (decimal numbers, e.g. 1.22m, 17.3°C)
- Discontinuous variables (=’meristic variables’ or ‘discrete variables’, e.g. no. of eggs in a nest: 3, counts per area: 11 plants per m²)
- Non-metric variables (=’categorical data’ or ‘attribute data’):
- Nominal variables (categories without any order, e.g. blue, red, green or female, male)
- Ordinal variables (=’ranks’, i.e. categories with an order, e.g. small, medium, large)
In principle we have two groups of data: Metric and non-metric. Metric data are measurements such as 27.3 cm, 14.5° Celsius or 70 kg. Also when you count objects, for example when you count the number of plants per area or eggs in the nest of birds these are metric data. There is, however, a small difference between weight and egg numbers. The number of eggs is a discontinuous or =discrete variable, because you won’t count 1.7 eggs in a nest, but only 1, 2 or 3 eggs. In contrast, weight is not discrete, because it can be 69.7 kg or 71.3 kg. The difference between “continuous” and “discontinuous” variable is, however, not extremely important, since most statistical methods for metric data can deal with both continuous and discontinuous variables.
More important is the difference between metric and non-metric data, because non-metric data are very different form metric data: When you have non-metric data, you count how often an observation is made. For example, you count how many patients have recovered and how many have not. Or you count the number of people with and without a bachelor degree. Or the number of smokers vs. non-smokers. What you measure for each patient is the category to which she or he belongs, e.g. smoker or non-smoker. Let’s see an example to illustrate the difference to metric data:
Metric data | Non-metric data | |||
Individual 1: | 65.5 kg | Individual 1: | Non-smoker | |
Individual 2: | 71.4 kg | Individual 2: | Non-smoker | |
Individual 3: | 59.3 kg | Individual 3: | Non-smoker |
Now, when we summarise these data, you will also see that metric and non-metric data are summarised differently. Metric data can be summarised using the mean which describes the typical weight (or median or any other value describing the average), and the standard deviation, which describes its variability. Finally, we should add the sample size to show how robust the data are.
Metric data | Non-metric data | |||
Mean: | 67.3 kg | N: | 10 | |
Standard deviation (SD): | 8.3 kg | Smokers (incidence): | 2 | |
N: | 10 |
Non-metric data can be summarised by the sample size N and showing how often an attribute was counted. There is no point for providing a mean value or a standard deviation.
So now it’s clear that non-metric data are conceptually different from metric data. So what about the difference between nominal and ordinal data. Nominal data are counts of characteristics or attributes that have no particular order, such as colours (blue, green, red) or gender (male, female). Also the attribute ‘smoker’ vs ‘non-smoker’ are nominal data. Here “nominal” means that we have classes with a name (which is the meaning of “nominal” data). Ordinal data are data with a hierarchy, such as small, medium and large. Or the degrees: bachelor, master and PhD. So what is an easy way to determine what kind of data you have? It’s quite simple, you only have to ask yourself “What did I measure?”, “What did I measure in each subject? What were my raw data?” When the answer is a number, then you have metric data. If the answer a characteristic or an attribute such as “red/green/blue”, “small/medium/large” or “dead/alive” then you have non-metric data.
Conclusions
The important data types are metric data and the two non-metric data types nominal and ordinal. Sometimes a data type called binary variables, for example “true” and “false” or “female” and “male” is addtionally distinguised. However, binary data is in principle nominal data. So there is not really a need to add complexity in this classification scheme. In some text books you also find the following classification scheme:
- Measurement data (=metric data)
- Continuous data
- Discontinuous data
- Ranked variables (=ordinal data)
- Attributes (=nominal data)
Here metric data are called measurement variables, ranked variables are ordinal variables and attributes are nominal variables. So in principle this classification scheme is very similar to the scheme shown above.
To conclude, whenever you are uncertain about data type, just ask the following question: What did I measure in each individual? If the answer is a number, you have metric data. If the answer is an attribute, then you don’t. Then ask yourself: Is there an order in the list of attributes? If there is, then you have ordinal data, otherwise you have nominal data.