Before taking you on the journey of learning, statistics, let's make some sense of data. 🤔
Data is actually in plural form; it contains information about individuals or units that have characteristics, also called variables. The values that variables assume are called data. Since the variables can be categorical or quantitative, data can also be divided into categorical and quantitative. 📦
When the variable assumes values that are attributes, we call the variable categorical, and data as categorical—for example, the colors of cars, names of states, districts, countries. The values for colors of cars may stretch from white to black, any possible color you may see on the street. Then it makes sense to group those values and compare them.
When we measure a characteristic that results in numerical values, then we deal with quantitative variables and subsequently with quantitative data—for example, the number of days, the price of the product, the age of the individuals. The quantitative data divided further into two types: discrete and continuous.
Recall your algebra class when we called discrete to those numbers that were whole and continuous to those numbers that come in the intervals. The price, weight, age are continuous because it can assume numbers in intervals. When data assumed are numbers, then it makes sense to find an average.
The variables can be measured at different levels: nominal, ordinal, interval, and ratio. The qualitative variables are nominal and ordinal. The difference between the two is that ordinal has some order between qualitative data, but nominal has not. For example, the satisfaction level of customers can be ranked by some order from most to least. The difference between interval and ratio is that interval level measurement ranks data, but there is no meaningful 0, whereas the ratio has 0 in its meaning.
The variables change from one individual to another, and so data change over time. If we ask the same question to different people we’ll get different answers. Statistics tools will help us notice the relationships and varied patterns among individuals. This variability makes the study of statistics more interesting. ⭐
Individuals
Variable
Data
Categorical Variable
Quantitative Variable
Distribution
Earlier, we established that variables refer to characteristics that change from one individual to another: age group, dominant hand, height, you name it! In statistics, one of the ways variables can be classified is between categorical or quantitative. Let's build upon the definitions we introduced earlier.
Categorical variables are variables that can be placed into categories or groups. These variables do not have a numerical value and cannot be ordered or ranked. Examples: gender, race, and marital status. 🫵
Quantitative variables are variables that can be measured or counted and have a numerical value. These variables can be either continuous or discrete. Continuous quantitative variables can take on any value within a given range, such as height or weight. Discrete quantitative variables can only take on certain values, such as the number of children in a household or the number of times a person has been hospitalized. 🔢
It is important to correctly identify the type of variables in a study because different statistical techniques are appropriate for analyzing data from different types of variables. For example, t-tests are commonly used to analyze data from continuous quantitative variables, while chi-square tests are commonly used to analyze data from categorical variables. Don't worry about the tests for now! We'll talk more about them later in Units 6 to 9 of this course.
Still confused? Here's a list of categorical variables:
Gender (male or female)
Race (white, black, Hispanic, etc.)
Marital status (single, married, etc.)
Employment status (employed, unemployed, self-employed, etc.)
Education level (high school, associate's degree, bachelor's degree, etc.)
Political party (Republican, Democrat, Independent, etc.)
Religion (Christian, Muslim, Hindu, etc.)
Eye color (blue, brown, green, etc.)
Hair color (blonde, brunette, red, etc.)
Birthplace (United States, Canada, Mexico, etc.)
What about quantitative variables? Here's a list of some of them:
Age (8, 16, 34, etc.)
Height (180 cm, 5'2", 2 meters, etc.)
Weight
Income
Body mass index (BMI)
Blood pressure
Heart rate
Hours of sleep (a controversial one for teens)
Distance traveled
Number of siblings
Let's dive even deeper by look at this example to see how well we can make a distinction between the two types of variables and data. In the example below we can learn more about variables. 😀
Transportation Safety
The chart shows the number of job-related injuries in each of the transportation industries in 1998.
Industry Number of injuries
Railroad 4520
Intercity bus 5100
Subway 6850
Trucking 7144
Airline 9950
1. What are the variables that we are studying?
Looking at the table, we can see that we have two variables; type of industry and number of injuries.
2. Categorize each variable as quantitative or qualitative.
The type of industry, of course, is a qualitative variable, as the values are names for transportation. At the same time, the number of job-related injuries is quantitative, as the values are numbers.
3. Categorize each quantitative variable as discrete or continuous.
The number of job-related injuries is discrete.
4. Identify the level of measurement for each variable.
The type of industry is nominal, and the number of job-related injuries is a ratio.
5. The railroad is shown as the safest transportation industry. Does that mean railroads have fewer accidents than the other industries? Explain.
This question makes you think about what the number means to you. The railroads do show fewer job-related injuries; however, there may be other things to consider. For example, railroads employ fewer people than the other transportation industries in the study.
6. From the information given, comment on the relationship between the variables.
We can see that the railroads have the fewest job-related injuries. In contrast, the airline industry has the most job-related injuries (more than twice those of the railroad industry). The numbers of job-related injuries in the subway and trucking industries are fairly comparable.
Bottom line: always look at data and see what you can see behind, how they are related, and how they compare to each other.
🎥 Watch: AP Stats - Unit 1 Streams