How to Choose an Appropriate Statistical Tool for Research Data Analysis

Introduction

It is obvious that data is involved in any kind of research. The data is collected and analyzed in order to arrive at a meaningful conclusion in relation to the focus of the study or research.

Data analysis is usually easier when we have a good understanding of the right statistical tool or technique or test to use. Before we dive deeper, let's refresh our knowledge of some basic data-related concepts.

Basic Concepts

Data: Data refers to a collection of facts and figures. It could be in the form of numbers, texts, images, sound, among others.

Primary data: This refers to the first-hand data collected directly from the source or subject or sample of a study or research. The subject or sample of the study could be human beings or plants or animals or other things. Primary data could be obtained through observation, survey, the use of questionnaire, and so on.

Secondary data: This refers to the data that was obtained from already existing data which was originally collected for some other purposes. For example, data obtained from existing records or journals or books or other publications.

Qualitative data: This has to do with qualities. They are data that are represented in categories or as string of texts. In other words, they are usually categorical in nature. Examples are nominal data (such as names of students, names of countries, colours of shoes, gender of farmers, and so on) and ordinal data (such as agreement to a statement, taste of food, rank in class, and so on).

Quantitative data: This has to do with quantity. They are data that are represented by numbers. They could be in discrete form or continuous form. Discrete data are the data that are obtained through counting, for example, the number of students in a class, the number of fruits in a basket, the number of technologies adopted by farmers, and so on). However, continuous data are the data that are obtained through measurements, for example, the height of students, the weight of an animal, and so on

One of the basic things to consider in choosing an appropriate statistical tool or technique or test for the analysis of collected data is the level of measurement of the variable under consideration.

Levels or Scales of Measurement

The four different levels of measurement are nominal, ordinal, interval and ratio.

Nominal: This is the first level of measurement. Data or variables at nominal level of measurement have categories that are mainly for naming purposes. Hence, the way the categories are ordered does not matter because there is no ranking attached. Even if values or numbers are assigned to each of the categories, the numbers are merely for identification and naming, and not for ranking. For example, in colour variable, if “Blue” category is assigned a value of 1, “Red” is assigned a value of 2, and “Yellow” is assigned a value of 3, the values 1, 2 and 3 are just for identification.

Ordinal: This is the second level of measurement. Data or variables at ordinal level of measurement have categories or groups just like nominal, but its additional characteristic is that the categories have strict order or ranking. In other words, how the categories are ordered is important and meaningful. For example, in a variable on "food satisfaction" with the categories- "not satisfied", "slightly satisfied", and "very satisfied", the ordering of the categories matters because it moves from negative to positive scale. Other examples are the level of adoption (with “High”, “Moderate” and “Low” categories) and agreement to a statement (with “Strongly Disagree”, “Disagree”, “Undecided”, “Agree” and “Strongly Agree” categories). Whenever values are assigned to the categories of ordinal variables, the rankings of the values are important.

Interval: This is the third level of measurement. Data or variables at interval level of measurement are usually in numeric form. That is, they are represented by numbers. However, the numbers do not have a true or absolute zero as starting point. Hence, they can have values below zero, such as negative values. Example of interval variable is temperature which can have any number as value, but the value of zero (0) degree centigrade does not indicate that there is no temperature because we still have temperature values (like -273.15 degree centigrade) that are less than 0. Therefore, zero (0) is not the starting point for temperature. Other examples of interval variables are time and IQ.

Ratio: This is the fourth and the highest level of measurement. Data or variables at ratio level of measurement are represented in numbers, and they have true zero as starting point. The "true zero" means that there is total absence or lack of the variable of interest whenever its value is 0. Hence, they cannot have values below zero (0) because at the point of zero, such properties do not yet exist. Examples of ratio variables are age, length, height, weight, scores, among others.

Choice of Appropriate Statistical Tool

There are different kinds of descriptive and inferential Statistical tools that we can use in data analysis.

Descriptive

Descriptive statistical tools provide a summary of the data. Examples are frequency count, percentage, mean, standard deviation, mode, minimum, maximum, and so on.

- In describing data or variables that are at nominal level of measurement, you can consider using frequency count, percentage and mode.

- In describing data or variables that are at ordinal level of measurement, you can consider using frequency count and percentage. You can also generate the mean values. The mean can be used to arrive at the overall ranking of the items under the variable. For example, if we want to analyze the food satisfaction of students on a 3-point rating scale of “Not satisfied (1)”, “Slightly satisfied (2)” and “Very satisfied (3)” for ten food items. The food item with the highest mean will indicate that it is the food item that majority of the respondents derive more satisfaction from.

- In describing data or variables that are at interval or ratio levels of measurement, you can consider the use of mean, standard deviation, minimum and maximum. Minimum and maximum used with percentage will help you to easily detect if there are outliers in your data set.

Inferential

Inferential statistical tools can be used to evaluate or analyze the relationships that exist between or among variables in the data set. They are useful for hypothesis testing. Some inferential statistical tools, such as regression, can be used to determine the predictor or independent variables that influence the outcome or dependent variable, and to make some forecast.

Chi-square:

In analyzing the relationship between a dependent variable that is categorical (that is, at nominal or ordinal level) and independent variable that is also categorical (that is, at nominal or ordinal level), you can consider the use of Chi-square. For example, you can use chi-square to test for the relationship or association between gender and level of academic performance of students.

Independent Sample T-test:

Independent sample t-test is used to compare the mean values of two groups and determine if there is a statistically significant difference between them, or if the difference between them occurs by chance.

In analyzing the difference or relationship between a dependent variable that is discrete or continuous (that is, at interval or ratio level) and independent variable that is categorical (that is, nominal or ordinal level) with only two categories or groups, you can consider the use of independent sample T-test. For example, let's assume that “marital status” variable has two categories (Single-1 and Married- 2), with the use of independent sample T-test, we can test for the significant relationship between the marital status of the farmers and their technology adoption score. In other words, independent sample t-test will compare the mean values of the adoption score for single farmers with the mean value of the adoption score for married farmers and determine if the difference between them is statistically significant or not.

Paired Sample T-test

Paired sample t-test (also known as dependent t-test or dependent sample t-test) is used to compare the mean values obtained for the same variable or sample but at different times, and determine if there is a statistically significant difference between them.

In analyzing the difference or relationship between a dependent variable that is discrete or continuous (that is, at interval or ratio level) at two different periods or times, you can consider the use of paired sample t-test. For example, let's assume that you we want to test for the significant difference between the technology adoption score of farmers before they were exposed to training and their technology adoption score after they were exposed to training, we can use paired sample t-test.

Analysis of Variance (ANOVA):

In analyzing the relationship between a dependent variable that is discrete or continuous (that is, at interval or ratio level) and independent variable that is categorical (that is, nominal or ordinal level) with three or more categories or groups, you can consider the use of Analysis of Variance (ANOVA). In other words, it compares the difference in the mean values of three or more groups and determines if statistically significant difference exists among them. For example, let's assume that “education” variable has three categories (“Primary education”- 1, “Secondary education”- 2 and “Tertiary education”-3), we can test for the significant difference or relationship between education and the technology adoption score of farmers, with the use of ANOVA.

Pearson Product Moment Correlation (PPMC):

In analyzing the relationship between a dependent variable that is discrete or continuous (that is, at interval or ratio level) and independent variable that is also discrete or continuous (that is, interval or ratio level), you can consider the use of Pearson Product Moment Correlation (PPMC). For example, we can test for the relationship between age and the technology adoption score of farmers, with the use of PPMC.

Simple Linear Regression:

In analyzing the relationship between a dependent variable that is discrete or continuous (that is, at interval or ratio level) and only one independent variable that is also discrete and continuous (that is, interval or ratio level) with the aim of determining how the independent variable influence the dependent variable and generating a model, you can consider the use of simple linear regression.

Multiple Linear Regression:

In analyzing the relationship between a dependent variable that is discrete or continuous (that is, at interval or ratio level) and two or more independent variables that are also discrete and continuous (that is, interval or ratio level) with the aim of determining how the independent variables influence the dependent variable, and generating a model, you can consider the use of multiple linear regression.

Binary Logit Regression:

In analyzing the relationship between a dependent variable that is categorical (that is, at nominal or ordinal level) with only two categories or groups, and independent variables that are at any level of measurement (that is, nominal or ordinal or interval or ratio level) with the aim of determining how the independent variables influence the dependent variable, and generating a model, you can consider the use of binary logit regression.

Multinomial Logit Regression or Ranked Logit Regression or Ordered Probit Regression:

In analyzing the relationship between a dependent variable that is categorical (that is, at nominal or ordinal level) with three or more categories or groups, and independent variables that are at any level of measurement (that is, nominal or ordinal or interval or ratio level) with the aim of determining how the independent variables influence the dependent variable, and generating a model, you can consider the use of multinomial logit regression or ranked logit regression or ordered probit regression.

You can specifically choose either ranked logit regression or ordered probit regression if the dependent variable is particularly at ordinal level of measurement where the ranking matters.

Conclusion

In this article, we have been able to learn how to choose an appropriate statistical tool or technique for analysis by considering the level or scale of measurement of the data collected.

This understanding will be helpful in making your work faster when you eventually begin analysing your data using any Data Analysis software (such as SPSS, SAS, Excel, among others).

That's it.

Happy learning.

SuperbImpact Blog

Search This Blog

How to Choose an Appropriate Statistical Tool for Research Data Analysis

Labels

Comments

Post a Comment

Popular Posts

Android Development: Addition of Bottom Navigation Bar with Kotlin and Jetpack Compose

How to add background image to your Android project with Jetpack Compose

Android Development: Adding New Activity, Explicit Intent and Top App Bar with Jetpack Compose