If there was a holy trinity for AP study sites, Quizlet would most certainly be in it. Its easy to use interface, combined with its multi-purpose functionality, helps students of all different learning styles in endless subject areas. However, it can sometimes be challenging to find the best vocab sets.
Fiveable’s AP Stats teachers & students have compiled the best quizlet study decks for each unit. The AP Stats exam is very concept heavy, so make sure you take the time to learn these terms.
Catch a live review or watch a replay for AP Stats on Fiveable’s AP Stats hub!
Unit 1 includes the roots of statistics, and it is very important to get these concepts down and memorized. In this unit, you will distinguish between categorical and quantitative data, describe and compare distributions, and begin to learn about normal distributions.
Key Terms:
Categorical Variable – Record which of several groups an individual belongs to
Quantitative Variable – Taking numerical values for future calculations of sorting
Describing a Distribution (SOCS) MUST include:
Shape – uniform/skew/peaks in context (symmetric, skewed left/right, unimodal/bimodal)
Outliers – mention outliers in context
Center – mean or median in context
Spread – range/standard deviation/IQR in context
Bar Graphs, Two-way Tables, Histograms, Stemplots, Dotplots, Box plots – All methods of representing Data. It is good to know how to read information off of each of these.
5 Number Summary – Minimum, Quartile 1, Median, Quartile 3, Maximum
The Empirical Rule – Tells us that if we all behave normally then about 68% of the values fall within 1 standard deviation of the mean, about 95% of the values fall within 2 standard deviations of the mean, and about 99.7%—almost all—of the values fall within 3 standard deviations of the mean.
Z-Score – z = x - x̄ / s
Unit 2 is an expansion of Unit 1. It builds on the relationships between two categorical or quantitative variables and how to argue about the strength between the two. This unit includes a lot of set interpretations of the different components of an LSRL which are very important to remember.
Key Terms:
Scatterplots – A way to organize data. On the x-axis is the explanatory (independent) variable and on the y-axis is the response (dependent) variable.
Form – linear or curved
Direction – positive or negative correlation
Strength – depends on the correlation coefficient; could be weak, moderate, or strong.
LSRL – ŷ=a+bx where x denotes (context) and ŷ denotes predicted (context)
Correlation Coefficient (r) – The correlation coefficient shows the degree to which there is a linear correlation between the two variables, that is, how close the points are to forming a line. The closer r is to 1 or -1, the stronger the relationship.
Slope (b) – There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable).
Y Intercept – The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context).
Coefficient of Determination (r^2) – ____% of the variation in (y in context) is due to its linear relationship with (x in context).
Residuals – The difference between the actual data and the value predicted by a linear regression model, or y-ŷ. The ideal pattern is random scatter above and below the LSRL (where the residuals=0). A positive residual means the model underestimated the true value while a negative residual means the model overestimated the true value.
Extrapolation – The use of a regression model to make predictions outside of the domain of the given data. If you go outside this domain and farther outside the domain you go, the less accurate your predictions will be.
This unit discusses sampling methods and ways of collecting data that can be used to represent a population. It is filled with vocabulary that is essential in future units.
Key Terms:
Experiment – Deliberately imposes treatment in order to observe the response. Causation could be proven.
🔭Observational Study – Observe individuals and measure variables of interest but don’t attempt to influence the responses. These studies look for association between variables because in a study, no treatment is imposed.
👀Confounding – Occurs when two variables are associated in a way that their effects on the response variable cannot be deciphered from each other individually.
Bias – Statistical studies are biased if it is likely to underestimate or overestimate the value you are looking for.
Simple Random Sample (SRS) – Chooses a sample size “n” in a way that a group of individuals in the population has an equal chance to be selected as the sample
Stratified Random Sample – Selects a sample by choosing an SRS from each strata and combining the SRSs into one overall sample. These reduce variability in the data and give more precise results.
Cluster Sample –The population is divided into groups, called clusters, and an SRS of clusters is taken within each cluster. All individuals are sampled in the clusters selected.
Systematic Random Sample – Sample members from a population selected according to a random starting number and a fixed periodic interval.
Everyday, we see things that happen simultaneously to the point we question the possibility of that event happening again. This brings in probability, the proportion of times the outcome would occur in a large number of repetitions. Unit 4 is all about probability and is very calculation heavy.
As of 12/22/21, this deck is private.
Key Terms:
Independent Events – The outcome of one event doesn't influence the outcome of another event.
P(A|B) = P(A)
P(A and B) =P(A)*P(B)
Disjoint/Mutually Exclusive Events – Cannot occur at the same time and have no outcomes in common.
P(A and B) = 0
P(A or B) = P(A) + P(B)
Conditional Probability – Probability of one event under the condition that another event is known.
P(A|B)= P(A and B) / P (B) – The probability of A given B = Probability of A and B / Probability of B
Discrete Random Variable – Takes a fixed set of possible values with gaps between them (cannot include decimals).
Mean (Expected) Value – Summation(xi*pi)
Standard Deviation – sqrt(summation(xi-mean of x)^2 * pi). You cannot add standard deviations, only variances.
Continuous Random Variable – Can take any value in an interval on the number line (can include decimals).
Law of Large Numbers – Says that in many repetitions of the same chance process, the simulated probability gets closer to the true probability as more trials are run.
Binomial Setting – Arises when we perform independent trials of the same chance process and count the number of times that a particular outcome called a success (p), occurs. Failure (q) is defined as 1 minus the probability of success. Must check 10% condition.
Binomial Setting – Arises when we perform independent trials of the same chance process and count the number of times that a particular outcome called a success (p), occurs. Failure (q) is defined as 1 minus the probability of success. Must check 10% condition.
⭕️ Geometric Setting – Arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. The number of trials Y that it takes to get one success in a geometric setting is a geometric random variable.
This unit is an introduction to significant tests, which are covered in later units. It begins introducing statistics, bias, the CLT, and population parameters.
Key Terms:
Sampling Distribution – A distribution where we take ALL possible samples of a given size and put them together as a data set.
A statistic is used to estimate a parameter.
Parameters vs Statistics – Mean (𝝁, x̅); Standard Deviation (σ, s); Proportions (𝝆, p̂).
Large Counts Condition – The number of successes and failures is at least 10.
Central Limit Theorem – States that if n (the sample size) is ≥30, the sampling distribution is normal. The larger n is, the more normal the sample is.
This unit is the beginning of significant tests, in which you are expected to check conditions, construct and interpret confidence intervals, and calculate a p-value. This unit consists of estimating population parameters involving categorical data.
Key Terms:
Confidence Interval – An interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population proportion.
In repeated sampling, I am __% confident that the true population proportion (context) falls within this interval.
Significance Test – Estimating the probability of obtaining our collected sample from the sampling distribution of our size when we assume that the given population proportion is correct.
Large Counts Condition – The number of successes and failures is at least 10 (np≥10 and n(1-p)≥10). This condition proves normality.
Random Condition – Reduces any bias that may be caused from taking a bad sample. When answering inference questions, it is always essential to make note that our sample was random. Without a random sample, our findings cannot be generalized to a population, meaning our scope of inference is inaccurate.
10% Condition – Check that the population in question is at least 10 times as large as our sample in order to prove independence.
Margin of Error – The "buffer zone" of our confidence interval; point estimate +- (z*)(stan. dev.)
Null Hypothesis – The hypothesis based on our claim that was given in the problem (p=___).
Alternative Hypothesis – The hypothesis that the claim in our null is not true (p<___, p>___ or p≠___).
Type I (𝞪) Error – When we reject our Ho, when in fact, we should have failed to reject.
Type II (𝞫) Error – When we fail to reject our Ho, but we actually should have rejected our Ho.
Power – How strong our test is because power = 1-P(Type II Error).
This unit is very similar to Unit 6, but instead of dealing with proportions (p), we are dealing with means (μ). Many of the concepts overlap, but it is important to note that we cannot write proportions anywhere we are running an inference test for quantitative data.
Key Terms:
Confidence Interval – An interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population mean.
In repeated sampling, I am __% confident that the true population mean (context) falls within this interval.
In repeated sampling, I am ___% confident that the true difference in population means (context) falls within this interval.
Central Limit Theorem – States that if n (the sample size) is ≥30, the sampling distribution is normal. The larger n is, the more normal the sample is.
Random Condition – Reduces any bias that may be caused from taking a bad sample.
10% Condition – Check that the population in question is at least 10 times as large as our sample in order to prove independence.
Null Hypothesis – The hypothesis based on our claim that was given in the problem (μ=___).
Alternative Hypothesis – The hypothesis that the claim in our null is not true (μ<___, μ>___ or μ≠___).
p<𝞪 – Since p<𝞪, we reject our Ho. We have convincing evidence at the 𝞪 level that (Ha in context).
p<𝞪 – Since p<𝞪, we reject our Ho. We have convincing evidence at the 𝞪 level that (Ha in context).
Degrees of Freedom – n-1
Chi-Squared significant tests operate differently than proportion and mean significant tests. All χ² distributions are skewed right. The degree of skewness depends on the degrees of freedom.
Best Quizlet Deck:
AP Statistics Chapter 11: Inference for Distributions of Categorical Data
Unfortunately, this deck has been deleted by its author. (12/22/21)
Key Terms:
Goodness of Fit Test – Must have random sampling. All expected counts must be greater than 5.
Degrees of Freedom – categories - 1
Null Hypothesis – There is no association between ___ and ___.
Alternative Hypothesis – There is association between ___ and ___.
Test for Independence – Checks the association between two variables in a single population.
Expected Counts – (row total)(column total) / (table total)
Null Hypothesis – There is no association between ___ and ___.
Alternative Hypothesis – There is association between ___ and ___.
Test for Homogeneity – Checks the distribution of a single variable in several populations to see if these populations are similar with respect to the variable.
There is variability in slopes as well. In this unit, you will learn how to perform significant tests and construct confidence intervals about the slope of a regression line.
Key Terms:
LSRL – ŷ=a+bx where x denotes (context) and ŷ denotes predicted (context)
Degrees of Freedom – n-2
Confidence Interval – b+- t*(SEb)
In repeated sampling, I am __% confident that the true slope falls within this interval.
True Slope – 𝞫
Null Hypothesis – 𝞫 = 0; changing x does nothing to y
Hopefully, these decks can help you study for your tests and ultimately, the AP exam. The best feature about Quizlet is the option to play games and use the flashcards wherever you are. When you are studying, you can always duplicate a deck and customize it to your own needs.
As long as you review these flashcards at least once a day a few days before your test, you should be good to go. Make sure to take advantage of starring flashcards you struggle with! Before a test, it's great to quickly look over the starred ones and then feel more confident about them.
You got this! Good luck studying.🍀