Lesson 1.A.5 - Boxplots, Outliers, & Resistance
Key Question: How much money will you really make?
Content: Five-Number Summary & Boxplots | Changing Units | Outliers & Resistance
Alignment: CED Topics 1.7.C - 1.8
Video
Course Resources
Resources for teaching our AP® Statistics curriculum.
- Lesson Flow - timing and flow of class, using our lesson materials
- Pacing Guide - pacing our units, with daily or block schedules
- CED Alignment Guide - aligning our lessons to the AP® Statistics Course and Exam Description
Teaching Resources
Resources for teaching with Skew The Script.
- Discussion Norms - our model discussion norms for the classroom
- Letter to Parents - letter to share with parents about our nonpartisan approach
- Teaching Math on Civic Topics - tips for teaching math lessons that cover civic topics
Lesson Notes
Lesson-specific insights from the creators of this lesson.
In this lesson, students learn the right questions to ask when determining how much money they’ll really get paid by future employers. In doing so, they walk away with a deeper understanding of measures of center, boxplots, and resistance to outliers – along with how easy it can be to mislead with data.
- Describe the effect of changing units on summary statistics
- Determine if a data set contains outliers
- Graph and interpret boxplots
- Describe the effect of outliers and skew on summary statistics
Before proceeding: Familiarize yourself with the lesson materials linked above (e.g. handout, handout key, slides, video). Then, for additional background and teaching tips from the lesson creators, check out the sections below.
- When your class arrives at the Key Question (How much money will you really make?), take a moment to surface students’ initial thoughts. In particular, prompt them to think about what questions they’d ask the employer about the statement: “The typical salary at our company is more than $65,000 per year.” In brainstorming questions, students will start to develop some intuition for how this figure could be misleading.
- Consider making the initial exploration of the effect of transformations on the simple data set (values 1 & 3) completely student-led, in small groups. As students perform addition and multiplication on the small data set and see how the mean and range transform, they can propose general rules for the effect of transformations on data sets.
- Our salary data set has one outlier, and it’s a high outlier. The lesson explores the effect of this high outlier on different summary statistics. For robustness, it’s also helpful to show them the effect that a low outlier would have on the different summary statistics (e.g. how a low outlier test score would lower the average test score).
First, download this lesson's Handout Key and read through its Discussion Question section. Then, check out our model discussion norms and the additional background notes below.
- As suggested in the Discussion Question’s hint, having students manually calculate the different summary statistics before and after the boss’s raise helps them build intuition for why some measures are resistant and other measures aren’t. The key point to emphasize is that the resistant measures (median, IQR) are based on central positions in the data set and, therefore, are not as affected by extreme values. The unresistant measures (mean, range, standard deviation) involve the values of the most extreme data points (the min and max) somewhere in their calculations. Hence, these measures are more sensitive to changes in these extreme values.
- An optional extension to the Discussion Question: What other contexts can you think of that would produce datasets with outliers or extreme skewness? How could different reported summary statistics be misleading in these contexts?
- The 13 company salaries are a simulated data set, created for pedagogical purposes. However, the high outlier and right skew shape are indicative of real income data, which tends to be right skew and contain high outliers. To surface this, ask students: “Do you believe the mean salary would paint a misleading picture at most companies, or just this one? Why?”
- The exchange rate between US Dollars and Canadian Dollars has hovered between 1.2 - 1.45 for several years. The value of 1.3 was used for the lesson because it’s within that range and makes calculations a bit simpler.
- Although it’s rarely done, data values can also be multiplied or divided by a negative scalar. In these cases, the measures of center get multiplied by the scalar, and the measures of spread get multiplied by the absolute value of the scalar.
- Boxplots are especially useful for larger data sets, in which a dotplot or histogram could be visually overwhelming. Because boxplots directly display certain measures of center (median) and spread (IQR), students should use those measures when describing a boxplot (rather than, for example, trying to visually estimate the mean or standard deviation).
- Although their visual simplicity is useful in many respects, boxplots lack visual information on two important aspects of quantitative distributions: sample size (the number of data values) and peaks (e.g. unimodal, bimodal, uniform). In particular, if peaks or modes are of interest, encourage students to use dotplots or histograms.
- In AP Statistics, outliers are visualized in boxplots with dots or asterisks. However, in other contexts, boxplots are graphed such that the whiskers extend all the way to the minimum and maximum – regardless of whether or not those values are outliers. It’s worth mentioning to students that both conventions for graphing boxplots are popular in the broader field.
Student Supports
Lesson-specific resources to support all learners.
- A simple visual can be helpful for reinforcing the fact that addition and subtraction don’t change measures of spread. Hold up a ruler horizontally. Ask students to imagine shifting every point on the ruler to the right by the same amount. Then, to see this in action, move the whole ruler to the right. Ask: “Did the spread or distance between the ends of the ruler change?” The answer is no. This helps students visualize addition and subtraction as a consistent shift of every data value, so that the spread between data values doesn’t change.
- The following visual can be help demonstrate how to interpret the shape of boxplots:
- Vocabulary used in the context of the lesson may include words that are unfamiliar or have several meanings. In particular, the following mathematical terms may need clarification or a definition provided:
- Outlier
- Boxplot
- Summary statistics
- Resistance
- In addition, the following contextual terms may need clarification or a definition provided:
- Salary
- Currency
- Drawing connections between mathematical and non-mathematical uses of the term “resistance” can help students internalize its mathematical definition. For example, a teacher (Mr. Median) who does not succumb to their temptation to eat the donuts in the teachers’ lounge is “resistant” to the donuts. Their colleague (Mr. Mean) who eats 12 of them is not resistant.
- Measures of center and spread that are resistant to outliers and skew are also sometimes called “robust” measures.