Lesson 1.B.4 - Experimental Design

Key Question: What causes gender gaps in STEM fields?

Content: Extraneous Variables | Principles of Experiments | Placebos & Blinding

Alignment: CED Topic 1.13.A

Video

Student Items

Handout: pdf, doc

Mastery Check: link

Teacher Items

Handout Key: pdf, doc

Mastery Check Key: link

Slide Deck: pdf, ppt

Course Resources

Resources for teaching our AP® Statistics curriculum.

  • Lesson Flow - timing and flow of class, using our lesson materials
  • Pacing Guide - pacing our units, with daily or block schedules
  • CED Alignment Guide - aligning our lessons to the AP® Statistics Course and Exam Description

Teaching Resources

Resources for teaching with Skew The Script.

Lesson Notes

Lesson-specific insights from the creators of this lesson.

Image

What causes gender gaps in STEM? Researchers from Yale devised a brilliantly simple experiment to see if hiring discrimination is a part of the equation. They printed up 127 identical copies of a fake lab job application and sent the copies to 127 different science faculty members. Here’s the twist: half of the faculty were sent applications with the name “Jennifer” on them. The other half had “John.” In this lesson, students analyze the study’s design and discover whether researchers found a difference in the faculty evaluations, based solely on the name.

Learning Targets
  • Describe extraneous variables and confounding
  • Describe the principles of experimental design
  • Describe the impact of placebos and blinding

Before proceeding: Familiarize yourself with the lesson materials linked above (e.g. handout, handout key, slides, video). Then, for additional background and teaching tips from the lesson creators, check out the sections below.


  • It’s important to start the lesson by explicitly naming several potential confounding factors at play (e.g. number of applicants, applicant qualifications, types of labs). That way, students can directly connect the experimental design choices in the Yale study to controlling/balancing these confounders.
  • It’s worth noting that, when analyzing gender gaps in STEM, most of the data collected in the field measures gender as binary (male or female). To represent the full STEM workforce, there needs to be more data collected on STEM professionals who don’t identify by this binary. However, even if the data we analyze in this lesson is incomplete, it’s still important to analyze trends among the data we already have.
  • Just as it was important in the sampling lessons to draw a distinction between bias and variability, it’s also important in this lesson to draw a distinction between confounding and variability. In particular, emphasize that random assignment reduces confounding, not variation. In addition, replication reduces variation, not confounding.

First, download this lesson's Handout Key and read through its Discussion Question section. Then, check out our model discussion norms and the additional background notes below.

  • This question provides an excellent opportunity to loop back in the topic of generalizability. Because the Yale study measures gender gaps in a specific context – science faculty evaluating written lab manager job applications – its results may not fully generalize to other parts of the hiring process (e.g. interviews) or other STEM positions (e.g. non-research STEM positions).
  • Hence, the information we learn from the Yale study does not, on its own, provide conclusive evidence about whether there are other factors that influence gender gaps in STEM. So, a wide variety of opinions on this question are consistent with the study results and can be expected in class conversation.
  • Although the differences measured in the study are statistically significant (so large that they’re unlikely to happen by chance alone), students may also wonder if they are practically important (so large that they have a felt impact in the context of hiring). Generally, the 0.8-0.9 point differences on the 7 point scales can be considered practically important. Especially in a competitive job market, gaining almost an additional full point out of 7 could be a substantial advantage.
  • Full citation for the Yale study: Moss-Racusin, C., Dovidio, J., et al. “Science faculty’s subtle gender biases favor male students.” PNAS October 9, 2012 109 (41) 16474-16479; https://doi.org/10.1073/pnas.1211286109
  • The Yale study is an example of an audit study. Audit studies are field experiments in which identical artifacts (e.g. application materials) are sent to evaluators, with only one attribute changed (e.g. the applicant name). Differences in results can then be attributed to the changed attribute. Audit studies are often used in the context of testing for discrimination. The most famous audit study is the 2004 study by Marianne Bertrand and Sendhil Mullainathan, in which the researchers sent identical resumes to employers, but changed the name to be either a White-sounding name or a Black-sounding name (based on birth certificate records). The researchers found a statistically significant difference in the rate of callbacks received by each name.
  • The use of the Yale study in our curriculum was inspired by conversation with authors of Advanced High School Statistics, which also uses the experiment as an example in their textbook.
  • The Yale study is an example of a completely randomized design experiment. In the next lesson (1.B.5), we’ll discuss other types of experimental designs and formally define the completely randomized design. However, the focus of this lesson is on the principles of experimental design that apply to all types of experiments.
  • Although experiments unlock the ability to make causal inferences, they can also be expensive and difficult to implement. Because of the cost, researchers often have to consider tradeoffs between causal inference and generalizability. For instance, it’s relatively easy and inexpensive to gather observational data about gender gaps across many STEM fields and positions. However, this observational data doesn’t allow for causal conclusions. By contrast, the Yale experiment allows for causal conclusions. However, its results can’t be generalized to unstudied parts of the hiring process (e.g. interviews), other types of STEM positions (e.g. non-research STEM positions), and other STEM fields (the researchers only recruited biology, chemistry, and physics professors to participate in the study).
  • Quasi-experiments or natural experiments describe specific scenarios in which “as good as random” variation is naturally found in observational data. In these specific scenarios, researchers can use advanced methods to draw causal conclusions from observational data. In fact, Harvard researchers used a quasi-experiment to determine that Skew The Script had a positive causal impact on AP Statistics test scores in Texas. That said, these methods are beyond the scope of AP Statistics. In AP Statistics, causal conclusions can only be made from experiments. They cannot be made from observational data.

Student Supports

Lesson-specific resources to support all learners.

  • For discussions of extraneous variables and confounding, it can be helpful to provide a simpler example before jumping to the more complex hiring example in the lesson. For instance, the ice cream example from Ionica Smeets can be helpful. In that example, an extraneous variable that may affect drownings could include the strength of currents in the sea. Because the behavior of the sea isn’t associated with ice cream sales, it doesn’t confound the relationship between ice cream sales and drownings. An example of a confounder is outdoor temperature. Warmer temperatures are associated with more drownings (more people swim in warmer weather) and with more ice cream sales. So, temperature confounds the relationship between ice cream sales and drownings.
  • Visual diagrams can also be helpful for distinguishing confounders from other types of extraneous variables:
  • Extraneous
    Confounder

  • Using the diagrams above can also be helpful for providing a visual understanding of the purposes of random assignment and replication. Random assignment breaks the association (arrow) between the confounder and the explanatory variable, so that it’s no longer confounding. Replication reduces the impact of the extraneous variable (minimizes the arrow between the extraneous and response variable), making the relationship between the explanatory and response variables easier to see in isolation.
  • Vocabulary used in the context of the lesson may include words that are unfamiliar or have several meanings. In particular, the following mathematical terms may need clarification or a definition provided:
    • Bias
    • Extraneous variable
    • Statistically significant
    • Replication
  • In addition, the following contextual terms may need clarification or a definition provided:
    • Resume
    • Hiring bias
  • The term “replication” has two meanings in the context of experiments. The most common meaning is that results of experiments should be similar if the experiment is repeated with a different sample. However, as shown in the lesson, the term is also used to represent the idea that multiple experimental units are used in experiments (i.e. the studies use a high sample size). It’s helpful to flag for students that the word can be used both ways.
  • This lesson provides an excellent opportunity to loop back in several key vocabulary terms introduced earlier in the unit: confounding, experimental units, explanatory variable, and response variable. When using these terms, it can help to pause and have students surface their own phrasing for what each term means.