Course Resources
Resources for teaching our AP® Statistics curriculum.
- Lesson Flow - timing and flow of class, using our lesson materials
- Pacing Guide - pacing our units, with daily or block schedules
- CED Alignment Guide - aligning our lessons to the AP® Statistics Course and Exam Description
Teaching Resources
Resources for teaching with Skew The Script.
- Discussion Norms - our model discussion norms for the classroom
- Letter to Parents - letter to share with parents about our nonpartisan approach
- Teaching Math on Civic Topics - tips for teaching math lessons that cover civic topics
Lesson Notes
Lesson-specific insights from the creators of this lesson.
In the year 2000, the World Health Organization declared that measles had been eliminated from the United States. But in 2025, the U.S. had its highest number of measles cases since 1991, with 93% of those cases among individuals who were unvaccinated or who had an unknown vaccination status. Why do some people not get the measles vaccine? One potential reason: In 1998, British doctor Andrew Wakefield (pictured at left) published a paper that linked the measles vaccine with autism. His study set in motion a long-lasting wave of vaccine skepticism. In this lesson, students analyze Wakefield’s original paper, as they explore whether there is evidence that vaccines cause autism.
- Distinguish poor study design from statistical error
- Describe Type I and Type II errors and their consequences
- Interpret power and determine factors that influence the power of a study
Before proceeding: Familiarize yourself with the lesson materials linked above (e.g. handout, handout key, slides, video). Then, for additional background and teaching tips from the lesson creators, check out the sections below.
- There has been no well-designed study that has established a causal link between vaccines and autism. However, rather than beginning the lesson with that final conclusion, it’s more powerful to allow students to openly analyze the strengths and weaknesses of Wakefield’s study. This provides them the opportunity to apply principles of experimental design and reach the conclusion themselves that Wakefield’s study falls short of showing a causal relationship.
- This lesson provides an opportunity to distinguish between statistical error and design flaws. Type I and Type II errors arise from random variation in well-designed studies, whereas poor study design introduces bias or confounding. Emphasize that statistical inference addresses uncertainty due to chance – not problems caused by flawed study design. Adjusting a significance level or creating wider confidence intervals cannot correct for the bias or confounding introduced by poor study design.
- The discussion of power extends the ideas of hypothesis testing beyond statistical significance and into study design. A high power can be achieved through strong study design (e.g. maximizing sample size) or poor study design (e.g. artificially raising the significance level). Distinguishing between these methods for optimizing power can help students better internalize the distinction between poor study design and statistical error.
First, download this lesson's Handout Key and read through its Discussion Question section. Then, check out our model discussion norms and the additional background notes below.
- Some students may point to the relatively small sample size of the University of Tokyo study as evidence that it was more likely to produce a Type I error. It can be helpful to clarify that the probability of a Type I error is determined by α, not by sample size. The stronger concern in this study is the large number of outcomes that were tested (the problem of multiple tests).
- Consider asking students to contrast the University of Tokyo study with the later Duke study. The Duke researchers focused on a single primary outcome. Testing one outcome is less likely to result in a Type I error than testing many outcomes at once.
- The higher likelihood of committing a Type I error when testing multiple outcomes is known as the problem of multiple tests. Although the problem of multiple tests is not explicitly listed in the AP course standards, it’s an important extension of the concept of Type I error. For a light-hearted visual representation of the problem of multiple tests, consider sharing this xkcd cartoon with students.
- Wakefield’s study serves as an effective example of how flaws in experimental design can limit the conclusions that can be drawn from data. The lack of comparison groups, random assignment, replication, and controls provides a useful review of the principles of experimental design.
- The contrast between the Wakefield study and the later oxytocin studies helps illustrate the difference between poor study design and statistical error. Wakefield’s study suffers from design flaws, whereas the Tokyo study provides an example of how a well-designed study can still produce a statistical error due to chance.
- The progression from the Tokyo study to the larger Duke replication study highlights the role of replication in scientific research. Repeated studies provide additional evidence and help researchers distinguish genuine effects from unusual results that arise due to sampling variability.
- The problem of multiple tests can be derived from the Type I error rate and the rules of probability. Imagine researchers are conducting 10 independent hypothesis tests, each with a Type I error rate of α = 0.05. In this scenario, the probability of making at least one Type I error is: P(at least one Type I error) = 1 − P(no Type I errors) = 1 − (0.95)10 ≈ 0.401. This means there is approximately a 40% chance of observing at least one statistically significant result by chance. This is the problem with multiple tests.
- Type I and Type II errors are not equally controllable. Researchers directly control the probability of a Type I error by choosing the significance level α before conducting the study. In contrast, the probability of a Type II error depends on several factors, including sample size, effect size, sampling variability, and α. Because researchers cannot simply choose a desired Type II error rate, they often focus on increasing a study’s power during the design stage by, for example, setting a desired target sample size.
- In an earlier lesson on confidence intervals, we explored how to calculate the sample size needed to achieve a desired margin of error. Although it’s not covered in the AP Statistics standards, an analogous calculation is possible for hypothesis testing. Specifically, researchers often calculate the sample size needed to achieve a desired level of power. Such calculations inform the target sample sizes for design of studies. These are known as power calculations.
Student Supports
Lesson-specific resources to support all learners.
- Type I and Type II errors can be difficult to distinguish because both involve incorrect conclusions. It can be helpful to encourage students to begin by determining whether the null hypothesis is true or false. Once this is established, only one of the choices becomes possible.
- When interpreting power, encourage students to use the phrase in a world where, as in “in a world where the alternative hypothesis is true, there is a ___% probability of correctly rejecting the null.” This intuitive language helps students internalize the meaning of power.
- Vocabulary used in the context of the lesson may include words that are unfamiliar or have several meanings. In particular, the following mathematical terms may need clarification or a definition provided:
- Random assignment
- Replication
- Direct control
- Bias
- Confounding
- Sample size
- Effect size
- In addition, the following contextual terms may need clarification or a definition provided:
- Measles
- Vaccine and vaccine skepticism
- Autism spectrum disorder
- Oxytocin
- Eye contact
- During class discussion, it’s important to emphasize the distinction between the terms “sampling error” and “sampling bias.” These terms cannot be used interchangeably in statistical contexts. Sampling errors occur when well-designed studies obtain unusual samples by chance. By contrast, sampling bias doesn’t occur by chance – it’s a flaw that is baked into poor study designs.