Introduction
It is obvious that data is
involved in any kind of research. The data is collected and analyzed in order
to arrive at a meaningful conclusion in relation to the focus of the study or
research.
Data analysis is usually
easier when we have a good understanding of the right statistical tool or technique
or test to use. Before we dive deeper, let's refresh our knowledge of some
basic data-related concepts.
Basic Concepts
Data: Data refers to a
collection of facts and figures. It could be in the form of numbers, texts,
images, sound, among others.
Primary data: This refers to the first-hand
data collected directly from the source or subject or sample of a study or
research. The subject or sample of the study could be human beings or plants or
animals or other things. Primary data could be obtained through observation,
survey, the use of questionnaire, and so on.
Secondary data: This refers to the data
that was obtained from already existing data which was originally collected for
some other purposes. For example, data obtained from existing records or
journals or books or other publications.
Qualitative data: This has to do with
qualities. They are data that are represented in categories or as string of
texts. In other words, they are usually categorical in nature. Examples are
nominal data (such as names of students, names of countries, colours of shoes,
gender of farmers, and so on) and ordinal data (such as agreement to a
statement, taste of food, rank in class, and so on).
Quantitative data: This has to do with
quantity. They are data that are represented by numbers. They could be in
discrete form or continuous form. Discrete data are the data that are obtained
through counting, for example, the number of students in a class, the number of
fruits in a basket, the number of technologies adopted by farmers, and so on).
However, continuous data are the data that are obtained through measurements,
for example, the height of students, the weight of an animal, and so on
One of the basic things to
consider in choosing an appropriate statistical tool or technique or test for
the analysis of collected data is the level of measurement of the variable
under consideration.
Levels or Scales of Measurement
The four different levels
of measurement are nominal, ordinal, interval and ratio.
Nominal: This is the first level of
measurement. Data or variables at nominal level of measurement have categories
that are mainly for naming purposes. Hence, the way the categories are ordered
does not matter because there is no ranking attached. Even if values or numbers
are assigned to each of the categories, the numbers are merely for
identification and naming, and not for ranking. For example, in colour
variable, if “Blue” category is assigned a value of 1, “Red” is assigned a
value of 2, and “Yellow” is assigned a value of 3, the values 1, 2 and 3 are
just for identification.
Ordinal: This is the second level
of measurement. Data or variables at ordinal level of measurement have
categories or groups just like nominal, but its additional characteristic is
that the categories have strict order or ranking. In other words, how the
categories are ordered is important and meaningful. For example, in a variable
on "food satisfaction" with the categories- "not satisfied",
"slightly satisfied", and "very satisfied", the ordering of
the categories matters because it moves from negative to positive scale. Other
examples are the level of adoption (with “High”, “Moderate” and “Low”
categories) and agreement to a statement (with “Strongly Disagree”, “Disagree”,
“Undecided”, “Agree” and “Strongly Agree” categories). Whenever values are assigned
to the categories of ordinal variables, the rankings of the values are
important.
Interval: This is the third level
of measurement. Data or variables at interval level of measurement are usually
in numeric form. That is, they are represented by numbers. However, the numbers
do not have a true or absolute zero as starting point. Hence, they can have
values below zero, such as negative values. Example of interval variable is
temperature which can have any number as value, but the value of zero (0)
degree centigrade does not indicate that there is no temperature because we
still have temperature values (like -273.15 degree centigrade) that are less
than 0. Therefore, zero (0) is not the starting point for temperature. Other
examples of interval variables are time and IQ.
Ratio: This is the fourth and
the highest level of measurement. Data or variables at ratio level of
measurement are represented in numbers, and they have true zero as starting
point. The "true zero" means that there is total absence or lack of
the variable of interest whenever its value is 0. Hence, they cannot have
values below zero (0) because at the point of zero, such properties do not yet
exist. Examples of ratio variables are age, length, height, weight, scores,
among others.
Choice of Appropriate Statistical
Tool
There are different kinds
of descriptive and inferential Statistical tools that we can use in data
analysis.
Descriptive
Descriptive statistical
tools provide a summary of the data. Examples are frequency count, percentage,
mean, standard deviation, mode, minimum, maximum, and so on.
- In describing data or
variables that are at nominal level of measurement, you can consider using frequency
count, percentage and mode.
- In describing data or
variables that are at ordinal level of measurement, you can consider using frequency
count and percentage. You can also generate the mean values. The mean can be
used to arrive at the overall ranking of the items under the variable. For
example, if we want to analyze the food satisfaction of students on a 3-point
rating scale of “Not satisfied (1)”, “Slightly satisfied (2)” and “Very
satisfied (3)” for ten food items. The food item with the highest mean will
indicate that it is the food item that majority of the respondents derive more
satisfaction from.
- In describing data or
variables that are at interval or ratio levels of measurement, you can consider
the use of mean, standard deviation, minimum and maximum. Minimum and maximum
used with percentage will help you to easily detect if there are outliers in
your data set.
Inferential
Inferential statistical
tools can be used to evaluate or analyze the relationships that exist between
or among variables in the data set. They are useful for hypothesis testing.
Some inferential statistical tools, such as regression, can be used to
determine the predictor or independent variables that influence the outcome or
dependent variable, and to make some forecast.
Chi-square:
In analyzing the
relationship between a dependent variable that is categorical (that is, at
nominal or ordinal level) and independent variable that is also categorical
(that is, at nominal or ordinal level), you can consider the use of Chi-square.
For example, you can use chi-square to test for the relationship or association
between gender and level of academic performance of students.
Independent Sample T-test:
Independent sample t-test
is used to compare the mean values of two groups and determine if there is a statistically
significant difference between them, or if the difference between them occurs
by chance.
In analyzing the
difference or relationship between a dependent variable that is discrete or
continuous (that is, at interval or ratio level) and independent variable that
is categorical (that is, nominal or ordinal level) with only two categories or
groups, you can consider the use of independent sample T-test. For example,
let's assume that “marital status” variable has two categories (Single-1 and
Married- 2), with the use of independent sample T-test, we can test for the
significant relationship between the marital status of the farmers and their
technology adoption score. In other words, independent sample t-test will
compare the mean values of the adoption score for single farmers with the mean
value of the adoption score for married farmers and determine if the difference
between them is statistically significant or not.
Paired Sample T-test
Paired sample t-test (also
known as dependent t-test or dependent sample t-test) is used to compare the
mean values obtained for the same variable or sample but at different times,
and determine if there is a statistically significant difference between them.
In analyzing the difference
or relationship between a dependent variable that is discrete or continuous
(that is, at interval or ratio level) at two different periods or times, you
can consider the use of paired sample t-test. For example, let's assume that
you we want to test for the significant difference between the technology
adoption score of farmers before they were exposed to training and their
technology adoption score after they were exposed to training, we can use
paired sample t-test.
Analysis of Variance
(ANOVA):
In analyzing the
relationship between a dependent variable that is discrete or continuous (that
is, at interval or ratio level) and independent variable that is categorical
(that is, nominal or ordinal level) with three or more categories or groups,
you can consider the use of Analysis of Variance (ANOVA). In other words, it
compares the difference in the mean values of three or more groups and
determines if statistically significant difference exists among them. For
example, let's assume that “education” variable has three categories (“Primary
education”- 1, “Secondary education”- 2 and “Tertiary education”-3), we can
test for the significant difference or relationship between education and the
technology adoption score of farmers, with the use of ANOVA.
Pearson Product Moment
Correlation (PPMC):
In analyzing the
relationship between a dependent variable that is discrete or continuous (that
is, at interval or ratio level) and independent variable that is also discrete
or continuous (that is, interval or ratio level), you can consider the use of
Pearson Product Moment Correlation (PPMC). For example, we can test for the
relationship between age and the technology adoption score of farmers, with the
use of PPMC.
Simple Linear Regression:
In analyzing the relationship
between a dependent variable that is discrete or continuous (that is, at
interval or ratio level) and only one independent variable that is also
discrete and continuous (that is, interval or ratio level) with the aim of
determining how the independent variable influence the dependent variable and
generating a model, you can consider the use of simple linear regression.
Multiple Linear
Regression:
In analyzing the
relationship between a dependent variable that is discrete or continuous (that
is, at interval or ratio level) and two or more independent variables that are
also discrete and continuous (that is, interval or ratio level) with the aim of
determining how the independent variables influence the dependent variable, and
generating a model, you can consider the use of multiple linear regression.
Binary Logit Regression:
In analyzing the
relationship between a dependent variable that is categorical (that is, at
nominal or ordinal level) with only two categories or groups, and independent
variables that are at any level of measurement (that is, nominal or ordinal or
interval or ratio level) with the aim of determining how the independent
variables influence the dependent variable, and generating a model, you can
consider the use of binary logit regression.
Multinomial Logit
Regression or Ranked Logit Regression or Ordered Probit Regression:
In analyzing the
relationship between a dependent variable that is categorical (that is, at
nominal or ordinal level) with three or more categories or groups, and
independent variables that are at any level of measurement (that is, nominal or
ordinal or interval or ratio level) with the aim of determining how the
independent variables influence the dependent variable, and generating a model,
you can consider the use of multinomial logit regression or ranked logit
regression or ordered probit regression.
You can specifically
choose either ranked logit regression or ordered probit regression if the
dependent variable is particularly at ordinal level of measurement where the
ranking matters.
Conclusion
In this article, we have
been able to learn how to choose an appropriate statistical tool or technique
for analysis by considering the level or scale of measurement of the data
collected.
This understanding will be
helpful in making your work faster when you eventually begin analysing your
data using any Data Analysis software (such as SPSS, SAS, Excel, among others).
That's it.
Comments
Post a Comment