Concordia College - Moorhead, Minnesota |  research@cord.edu

Classifying Data

Classifying Data

Any time we are recording data, we need to think about what type of data we are collecting. The type of data we collect determines how we should model, interpret, and display the data. In this module, we will discuss the different types of data we might collect. For information on what kinds of graphs work best with each type of data, see the Graphs Module.

Numeric Data

The three types of numeric data we are going to discuss are: 1) counts, 2) continuous, and 3) percents. These types of data are distinctly different and lend themselves to very different types of graphs.

Count (Discrete) Data

Counts are exactly what the name suggests. Counts are very common and examples include counting the number of food vacuoles formed by Tetrahymena, the number of cases of West Nile reported in Arizona, etc. Counts are restricted to specific whole number values called integers. This means that no intermediate values are possible. From the previous examples, we can see that we cannot count half of a food vacuole; either one is formed or one is not. Also, we cannot report half a case of West Nile virus, either we have the virus or we do not.

Continuous Data

On the other hand, continuous data are not restricted to taking on certain specified values like counts. The difference between any two continuous data points could be expressed in arbitrarily small units. Continuous data can take many forms, such as the heights of students at Concordia College or the serum cholesterol levels of patients at Sanford.

Since fractional values are possible in continuous data, the degree of accuracy in the measurements depends on the sensitivity of the instrument used. For example, time measurements are often rounded to the nearest second or hundredth of a second depending on the sensitivity of the clock used. Weight measurements are rounded to the nearest kilogram or tenth of a gram, depending on the type of scale used to measure the weight.

Percent Data

The final type of numeric data we are most likely to collect are percents. Percents are proportions that are expressed on a per 100 scale. For example, say we wanted to see what percent of patients in a study experienced unwanted effects from a drug in clinical trial. We would set up a fraction with the total number of patients in the study on the bottom (denominator) and the number who experienced unwanted effects on the top (numerator). We would typically convert that fraction to a proportion (by dividing the fraction) and then to a percent (by multiplying by 100).

% of patients with unwanted effects = (no. of patients with unwanted effects / total no. of patients)×100

Categorical Data

Now that we have talked about these three types of data, we are going to talk about a type of structure data can take. A very simple type of data is categorical (nominal) data. Categorical data are data that usually don’t have arithmetic performed on them. For example, we wouldn’t perform calculations on a data set of hair colors or phone numbers.

Reference

Pagano M and Gauvreau K (2000). Principles of Biostatistics, 2nd edition. Duxbury Press. ISBN 0534229026.