The Report of Stereotype Threat's Demise Has Been Greatly Exaggerated
And why I'm feeling a certain way about it
Today I’m feeling some way about our field of social psychology. This past weekend, we had a terrific disciplinary conference (Society for Personality and Social Psychology- SPSP) in which I felt connected and surrounded by community dedicated to understanding some of the most important problems of our time—and bringing our tools to bear to investigate and intervene in order to address these problems in the world.
My book, Cultures of Growth, was awarded the SPSP Book Prize for the Promotion of Social and Personality Science. There, I re-connected with Michael Inzlicht who won the Carol and Ed Diener Award in Social Psychology for, in part, his work on stereotype threat—as was described in his introduction at the awards ceremony. It was lovely to reconnect with Mickey (I contributed a chapter to his well-cited book, Stereotype Threat) and my husband shared with Mickey a video of him receiving his award since his spouse couldn’t attend but wanted to see him win the award.
Then, on Friday of the conference, my former PhD student, Katie Kroeper, published an extensive (9 studies, 8500+ participants) paper in Science Advances that offers a solution to one of the biggest challenges we’ve had in stereotype threat and social identity threat research—which is how to measure people’s experiences of threat in a reliable way that can account for how identity threat shows up contextually, by identity group, and with regards to different kinds of identity-based concerns that different groups have in different situations. In that paper (on which I’m a co-author), we show that across gender, race, sexuality, age, weight, political orientation, religion, mental health status, and citizenship status, the experience of identity threat is situational, multifaceted, and nearly universal.
Finally, today is publication day for my former PhD advisor, Claude Steele’s new book, Churn, following his internationally successful Whistling Vivaldi, focused on people’s experiences of stereotype threat. I’ve been so excited about the release of this book because I know the world is going to benefit from Claude’s most recent thinking on how to bridge the identity-related tensions that divide us in this current moment.
So, it felt like a bit of a gut punch to open my email this morning and see that Dominic Packer and Jay Van Bavel decided to republish an older column that Mickey Inzlicht wrote about his views on stereotype threat in their Substack newsletter, The Power of Us (with more than 10K subscribers)—without a rebuttal or opportunity to respond by anyone conducting stereotype threat research. The post (not to mention its title, “The Downfall of Stereotype Threat”) felt salacious, unnecessary, and also unfair to the science and scientists who have (and continue to) spent their careers doing research on stereotype threat and social identity threat—doing the hard work of understanding how it operates, in what situations, and how it shapes people’s experiences and outcomes.
So, I’d like to put the record straight about how I (as someone who has studied these phenomena for 20+ years) see stereotype threat and social identity threat:
Stereotype threat is real and virtually universal. Every social group is impugned by some set of negative stereotypes, and the people who are part of these groups have experienced concerns that they will be seen through the lens of those stereotypes and judged on the basis of them. This is the experience of stereotype threat and it is real.
Stereotype threat is not underperformance. As many scholars have consistently written (here, here, here, here, and here as just a few examples), one (persistent) misunderstanding some in the field have had is to confuse the experience of stereotype threat and social identity threat with the outcomes that these experiences predict. When I am concerned that someone is going to think I’m bad at math because I’m a woman, I feel that concern psychologically—my emotional state is heightened, I get tight, I don’t want people thinking that about me—but if you give me a math test right there in the moment, I may or may not underperform on it. In fact, if it’s an easy test, I may even do really well because I’m so motivated to show you I’m not bad at math (disproving the stereotype; see here and here).
Stereotype threat is one form of social identity threat. For the last 15 years, evidence has amassed in the literature that stereotype threat is just one form of social identity threat. Social identity threat is the concern that one may be devalued or mistreated based on one or more of their social group memberships. And, there are many forms of social identity threat. As Katie’s recently-published Science Advances paper says,
“…We might worry about being negatively stereotyped [stereotype threat; (4)] or question whether we truly belong [belonging uncertainty; (5)]. At other times, we may fear others will deny core aspects of our identity [identity denial and identity erasure; (6–8)] or we may feel pressured to hide parts of ourselves to succeed or stay safe [fairness and physical safety concerns; (9–11)].”
Stereotype threat (and the broader social identity threat) has always been theorized and empirically demonstrated to be a situational experience. Stereotype threat comes online when we are exposed to situational cues that make our group memberships and the stereotypes attached to them salient in a setting. We do not always, in all contexts, experience stereotype threat. We experience it in environments when we are doing something (e.g., like having an interracial interaction) for which we could be negatively stereotyped (as a racist if I’m perceived as White, or as an angry Latina, if I’m perceived that way). The situational nature of stereotype threat means that it doesn’t show up in all situations—and thus in all studies that have attempted to assess it. Indeed, in order to create a fair test of stereotype threat, scientists must understand the experience of people with specific identities, the situations that make those identities salient, and the concerns of people with those salient identities. This is hard for scientists to do—and takes a lot of deep, careful, pilot work before one can be certain that one’s study context is a good test of the stereotype threat hypothesis.
This also means that stereotype threat is contextual and historical—and the outcomes it influences can shift based on the nature of those stereotypes and the situational cues that make those stereotypes salient at a particular time and place. For example, in a meta-analytic registered report accepted for publication, it appears that the stereotypes impugning women’s math abilities have shifted and improved over time (a good thing!), but this also means that the ability for researchers to find math test performance effects in stereotype threat studies has lessened. However, that same study shows this decline is sadly not true for stereotypes impugning Black people’s intellectual abilities—and the paper’s results show that the stereotype threat race effects do seem to persist over time.
What about the non-replications? For all of the above reasons, it’s perhaps unsurprising that some empirical tests of the relationship between stereotype threat and test performance have not replicated. Sometimes, it’s clear why that might happen: the tests are with groups that are not invested in their group identity, or the situation has not made the stereotypes attached to their group identity salient. If you ask me about how I feel about being a woman when we’re out having a fun, social dinner with friends, I’ll say “Great, I love being a woman!” and if you ask me in a room full of male physicists, I’ll say “Hmmm, that’s a complicated question—why are we talking about this?” Or, as I mentioned before, the tests are too easy and people under threat overperform because they are motivated to disprove the stereotypes about their group. In other studies, it seems the scientists have confused the experience of stereotype threat (e.g., identity threat concerns about being devalued, disrespected, or mistreated, the emotional experiences tied to threat, etc.) with the outcomes that stereotype threat may predict (e.g., performance). In other words, there are good theory-consistent reasons that some tests might not replicate.
Researchers in this area have never said that stereotype threat will always lead to impaired performance. It only does so in a specified set of circumstances: (1) when the experience and concerns about being viewed, judged, or mistreated because of negative group-based stereotypes are activated; (2) when the test is high stakes—meaning it has consequences for one’s identity because either one’s future is on the line (“I’ve got to do well on this test in order to get into college”) or because one’s identity is on the line (“I’ve got to do well on this test so that people don’t think my group is dumb.”); and (3) when the test is really hard and pushes the limits of one’s ability (disrupting working memory and impairing performance). Many studies that have examined stereotype threat have not done the extra work required to make sure that the testing situation meets these theorized conditions.
I also want to note that where Mickey seems to see some failures to replicate the test performance effects in the lab (using manipulations from the 1990s) as evidence that stereotype threat research is not real, the cultural landscape has changed, and society has taken the stereotype threat literature seriously—creating programs, schools, and family systems and practices that rebut societal stereotypes and give young people strategies and ways to reframe and shift those stereotypes. This indeed means that the phenomenon may not be as easily bottled in the lab using the past manipulations. When theories succeed, their original manifestations can change. There is a deep irony in using non-replications of 1990s-era laboratory studies to declare stereotype threat dead. Over the past three decades, the research on stereotype threat has not simply sat in academic journals—it has entered and changed the culture.
In Science, non-replication is the BEGINNING of an inquiry, not the END. While Mickey and other critics have taken some non-replications of the original stereotype threat performance studies (conducted in the 1980s and 1990s) and concluded that this means that stereotype threat is not “real,” other scientists see non-replication as a challenge to more deeply understand. If anything, I think there is now consensus among many psychologists and scientists across fields that failures to replicate can be even more informative and helpful to thinking about why the studies don’t replicate and deepening theory and evidence. When something fails to replicate we can ask whether it was a good test of the hypothesis. Were the psychological factors necessary for the phenomena present? Were the outcomes assessed appropriate to the theory and validly measured? There are lots of reasons some studies don’t replicate original studies—and in good science, these non-replications are used to examine the theory, update it and inspire new directions in the work.
I think it’s important to acknowledge that there were some real limitations of the early stereotype threat literature (e.g., small sample sizes that were common practice at the time). The replication movement illuminated the many problems with these practices and the stereotype threat and social identity threat studies published since have taken these concerns seriously and addressed them with larger sample sizes, preregistered reports, and other methodological and analytic improvements. Indeed, perhaps the highest-powered, largest-sample test of the link between stereotype threat and test performance among Black students is in progress now. I am interested to see what that study will show.
There is a lot of evidence supporting the existence (and impact) of stereotype threat. I won’t give an exhaustive review here—many papers have done that well—and that’s not the point of this post. Motivated readers can find this work online. However, to my mind, one of the most powerful empirical examinations of the experiences of stereotype threat is the most recent Science Advances. A big challenge we’ve had in stereotype threat and social identity threat research (and practice) is assessing people’s experiences of threat—how it shows up contextually, by identity group, and with regards to different kinds of identity-based concerns that different groups have in different situations. Without a good, standardized way to assess threat, researchers talk past each other (and fail to replicate each other’s work). This paper validates a measure that can be used to assess people’s identity threat concerns and I believe it will advance the field and help us understand how to create more equitable learning and working environments. The new research reveals the identity threat experiences and identity-relevant concerns among more than 8500 people based on their group memberships ( as varied as gender, race, sexuality, age, weight, political orientation, religion, mental health status, and citizenship status), demonstrating the experience of identity threat is situational, multifaceted, and nearly universal.
There are also many real-world field studies and interventions that sit on the shoulders of stereotype threat theory and use it to motivate their methods, hypotheses, and results. For instance, in separate work by scholars Mesmin Destin and Christina Bauer, identity-reframing interventions create spaces for people to refute negative, stereotypic representations of their group and instead help people view their identity as a strength and source of agency and motivation. These interventions—premised on the idea that negative group-based stereotypes otherwise undermine people’s agency, motivation, and experience—produce gains in motivation, persistence, and performance while reducing inequalities. We don’t get these kinds of intervention studies and programs without stereotype threat theory.
The experience of stereotype threat (and the broader social identity threat) is, in itself, an important phenomena to understand and study. These experiences shape whether women go into (and persist in) STEM or instead pursue a field where their group is better represented; they shape whether we engage in intergroup dialogue and friendship, whether we are willing to listen to each other’s opinions and learn from each other; and yes, sometimes, whether we perform to our potential.
These experiences are felt now more than ever. I’ve been struck recently by many friends who don’t seem to have an outwardly-visible stigmatized identity (eminent white male scholars in social psychology), who, because of the war in Gaza, have been feeling stereotype threat experiences and concerns related to their Jewish identity—sharing with me a renewed appreciation and understanding of stereotype threat because of their experiences. To me, this shows how relevant and universal the experience of stereotype threat is, especially today.
In conclusion, it’s my belief that the report of stereotype threat’s demise is greatly exaggerated.
Replications and new methods and practices have improved the field and addressed early limitations of the work. The experience of stereotype threat is not confined to lab experiments and performance outcomes—it is experienced by virtually everyone across identities.
Everyone is entitled to their own opinions, but when it comes to science, it’s important to understand and present as much of the full picture of the literature as possible so that our opinions are as informed as possible. Sadly, this was not possible with The Power of Us’s newsletter that featured Dominic, Jay, and Mickey’s opinions, so I was motivated to share my thoughts here.
With gratitude,
Mary



"Stereotype threat is being at risk of *confirming*, as self-characteristic, a negative stereotype of one's group." Steele and Aronson, 1995. First line of abstract. Emphasis added.
Contrast with "Every social group is impugned by some set of negative stereotypes, and the people who are part of these groups have experienced concerns that they will be seen through the lens of those stereotypes and judged on the basis of them. This is the experience of stereotype threat and it is real."
Stereotype threat, as per original definition (which requires not "concerns with being judged by stereotypes" but, rather, actually behaviorally confirming the stereotype), has been so repeatedly disconfirmed in pre-registered studies with women and math that it is reasonable to consider it resoundingly falsified pending someone producing strong, clear, replicable evidence that *behavioral confirmation* of negative stereotypes results from being threatened by them. I hear there is a big multi-lab RRR in the works for stereotype threat and race. If you are interested in a friendly bet on whether the simple average of all performance effects are >or<r=.10, I'll take the < side.
Of course, anyone is welcome to change the definition and study something that fits under the new definition. But anyone who does so is no longer talking about the original phenomenon. It would be better to give a new phenomenon a new name so as not to confuse it with the original. Else one leaves oneself open to the charge of attempting to smuggle in (and maintain the rhetoric around) the original (falsified) phenomenon (behavioral confirmation) by using the same term, and, of changing the goalposts.
Hi Mickey,
Let me challenge your claims here. We need to distinguish between what the theory says and the findings that captured the field's imagination. Findings that captured the field's imagination are not the theory and we need to accurately describe the theory and its development.
Since I was around at the beginning of the development of the theory I can confirm that Mary is correct that the theory as it was first proposed was not about test performance, but rather was about school performance and dropping out of school. Test performance wasn't even on the radar of the theory at the beginning, but you don't need to take my word for that. All you have to do is look back to Claude Steele's 1992 article in the Atlantic Monthly, "Race and the Schooling of Black Americans." This article was Claude's first published writing on the theory and it clearly is about school achievement and not at all about test performance. The nascent idea of stereotype threat was also clearly expressed in this article, but the term stereotype threat wouldn't emerge for several years until shortly before Steele & Aronson was published in 1995 (In the original submission of Steele & Aronson and what would become Spencer, Steele, & Quinn the two articles were submitted and reviewed as a package and the term was stereotype vulnerability. In the review process the articles were separated and in the resubmission of the now two distinct articles Claude, Josh, and I decided to change the term to stereotype threat). The idea, however, was clearly In the 1992 article where Claude describes the idea as double devaluation. He writes, " Like anyone, blacks risk devaluation for a particular incompetence, such as a failed test or a flubbed pronunciation. But they further risk that such performances will confirm the broader, racial inferiority they are suspected of. Thus, from the first grade through graduate school, blacks have the extra fear that in the eyes of those around them their full humanity could fall with a poor answer or a mistaken stroke of the pen." And what was this psychological phenomenon proposed to affect in this article? Not performance on standardized tests, but rather school achievement.
Two other things to note about this 1992 article are 1) it already argues the core ideas behind the Supreme Court amicus brief you mentioned, and 2) it already points to intervention studies as a core test of the theory. It does so all without yet considering the implications of stereotype threat for test performance.
As one of the authors of the amicus briefs for the Supreme Court on stereotype threat, I can say it is a mischariterization of those briefs to say they are based on Steele & Aronson and Spencer, Steele, & Quinn. Instead the briefs were based primarily on the Psychological Science paper that Greg Walton and I wrote in 2009. In that paper we drew on the prediction of a later performance from an earlier performance and noted that if we let an earlier performance predict a later performance members of stereotyped groups did worse than members of non-stereotyped groups. Claude described this phenomenon that we called in the lab the parallel lines phenomenon in the 1992 Atlantic Monthly article this way: "From elementary school to graduate school, something depresses black achievement at every level of preparation, even the highest. Generally, of course, the better prepared achieve better than the less prepared, and this is about as true for blacks as for whites. But given any level of school preparation (as measured by tests and earlier grades), blacks somehow achieve less in subsequent schooling than whites (that is, have poorer grades, have lower graduation rates, and take longer to graduate), no matter how strong that preparation is."
What Greg and I did in our paper is build on this reasoning that was already present in Claude's first writing and argued that this something that was depressing blacks achievement (both on standardized tests and performance in school) was stereotype threat. In a meta-analysis of both experiments and interventions in school settings we found the typical parallel lines phenomenon that Claude highlighted in the 1992 Atlantic Montly article when stereotype threat was high, but we found a reversed set of parallel lines with Blacks performing better than Whites as every level of previous performance when stereotype threat was low. It was this meta-analysis critically based on both laboratory experiments *and interventions* that formed the basis of the amicus briefs for the Supreme Court.
And Mickey while you may contend that interventions to lower stereotype threat are not crtical tests of the theory, Claude has from his first writing in the 1992 Atlantic Monthly maintained that they are a critical test of his theorizing. In fact, they are integral to the theorizing. Again, I quote from the 1992 Atlantic Monthly article: "If racial vulnerability undermines black school achievement, as I have argued, then this achievement should improve significantly if schooling is made "wise"--that is, made to see value and promise in black students and to act accordingly.
And yet, although racial vulnerability at school may undermine black achievement, so many other factors seem to contribute--from the debilitations of poverty to the alleged dysfunctions of black American culture--that one might expect "wiseness" in the classroom to be of little help. Fortunately, we have considerable evidence to the contrary. Wise schooling may indeed be the missing key to the schoolhouse door."
In this passage Claude is setting out the hypothesis that if "racial vulnearability," (i.e., what we would later call stereotype threat) is what is undermining Black achievement, then "wise" schooling will restore Black achievement. From the beginning this was a critical hypothesis in the theorizing and it was put to the test as early as the experiments. As we were developing the experiments we were also developing and testing a "wise" intervention and the data from that intervention was included in the paper that Greg and I wrote on which the amicus briefs were based.
Mickey, I do agree that the experiments captured the imagination of the field, but I disagree that experiments were tests of the theorizing but the interventions were not. Clearly the theorizing was developed from Claude's contention that "wise" interventions provide important evidence that the idea that would become stereotype threat was undermining Black achievement. This contention was central to the theorizing from the beginning. For you to ignore the evidence of these interventions and contend they are irrelevant to the theorizing is to misunderstand the contentions of the theory and to ignore important evidence. It is my view that the success of these interventions that are clearly based on the theorizing always have been and continue to be some of the most compelling evidence for the importance of the phenomenon of stereotype threat.
Having said all of that I also want to make clear, what Katie Kroeper and Mary and our colleagues and I have said in our recent paper in Science Advances. The pheonomenon of social identity threat (of which stereotype threat is one specific type) is important in it own right. The concern that you will be stereotyped by others is real and has real implications.
The relation of that phenomenon to test performance, however, needs to be understood more fully. The recent registered report by Stoevenbelt and colleagues in which the findings by Johns, Schmader, and Martens were not replicated is an important piece of evidence. I was a reviewer of that paper (I signed my review so that is not news to the authors), and I recommended publication of these findings. At the same time, I have also seen a preregistered analysis of these findings that I believe will be published with the article that suggests that the interpretation of these data may not be so simple. If the sampling is done in the same way as in the original study and was argued by Toni Schmader before the replication attempt should have been done in the replication the results are much less clear. In addition there are other important registered replications including one examining racial differences from which we should wait to see the results. Let me ask you Mickey, if that registered replication findw evidence that the effect replicates will you retract your statement that stereotype threat doesn't exist and isn't real? I do agree very much with your original post that when science is operating well the evidence provides correction, and I know you well enough that I trust that you will follow the evidence. I too will be open to all the evidence as I believe I have been to the Stoevenbelt replication. In time we will sort this out, but I thought when you originally published that substack piece and in this rewrite of your original post you rushed to a judgment before all the evidence was in and you inappropriately ignored the evidence from the interventions. That is still my view, but I trust in time we can get on the same page.