In ‘Repeat spreaders and election delegitimization’ paper, CIP-led research team analyzes comprehensive dataset of misinformation tweets from 2020 U.S. elections

Jun 13, 2022

In a new paper published June 13 in the Journal of Quantitative Description: Digital Media, a research team led by the University of Washington’s Center for an Informed Public details an expansive collection of tweets they hope will help researchers interested in examining the broad scope of misinformation circulated during the months before and after the 2020 U.S. election, including Twitter accounts that repeatedly spread election-related misinformation. 

In “Repeat spreaders and election delegitimization: A comprehensive dataset of misinformation tweets from the 2020 U.S. election,” researchers examine a uniquely curated dataset of misinformation, disinformation, and rumors spreading on Twitter about the 2020 U.S. election, an approach that, according to the paper’s abstract, leverages real-time reports collected from September through November 2020 to develop a comprehensive dataset of tweets connected to 456 distinct misinformation stories from the 2020 U.S. election.”

Of those misinformation stories in the ElectionMisinfo2020 dataset, 307 sowed doubt in the legitimacy of the election, the researchers found. 

“By relying on real-time incidents and streaming data, we generate a curated dataset that not only provides more granularity than a large collection based on a finite number of search terms, but also an improved opportunity for generalization compared to a small set of case studies,” they wrote. “Though the emphasis is on misleading content, not all of the tweets linked to a misinformation story are false: some are questions, opinions, corrections, or factual content that nonetheless contributes to misperceptions. Along with a detailed description of the data, this paper provides an analysis of a critical subset of election-delegitimizing misinformation in terms of size, content, temporal diffusion, and partisanship.”

The paper was co-authored by Ian Kennedy, a UW Sociology doctoral student who recently graduated; Morgan Wack, a UW Political Science doctoral student; Andrew Beers, a UW Human Centered Design & Engineering doctoral student; Joseph S. Schafer, a CIP undergraduate research assistant and incoming HCDE doctoral student; Isabella Garcia-Camargo, a former Stanford Internet Observatory research analyst; CIP co-founder Emma S. Spiro, a UW Information School associate professor; and CIP faculty director and co-founder Kate Starbird, a HCDE associate professor. 

The tweet collection and analysis work was done as part of the nonpartisan Election Integrity Partnership, a consortium of researchers from Stanford Internet Observatory, the University of Washington’s Center for an Informed Public, Graphika, and the Atlantic Council’s DFRLab, While that group published a final report, “Misinformation and the 2020 Election,” in March 2021, the data collection efforts have spurred additional research, including the analysis work associated with the “Repeat spreaders and election delegitimization” paper in the Journal of Quantitative Description: Digital Media. 

According to Kennedy, who will be starting a postdoctoral fellowship at Rice University, the paper “addresses simple but important questions about misinformation spread during the 2020 election like who repeatedly spread it? Can we quantify partisanship?” 

In the paper, the researchers “label key ideological clusters of accounts within interaction networks, describe common misinformation narratives, and identify those accounts which repeatedly spread misinformation.” Additionally, they document the asymmetry of misinformation spread about the 2020 election. They found that accounts associated with support for President Biden shared stories in the ElectionMisinfo2020 dataset far less than accounts supporting his Republican opponent, Donald Trump. 

The researchers, who include a list of the top-35 so-called “repeat spreader” accounts on Twitter (see pages 29-30), continue: “That asymmetry remained among the accounts who were repeatedly influential in the spread of misleading content that sowed doubt in the election: all but two of the top 100 ‘repeat spreader’ accounts were supporters of then-President Trump.” These findings, they write, “support the implementation and enforcement  of ‘strike rules’ on social media platforms, directly addressing the outsized role of repeat spreaders.” 

The curated ElectionMisinfo2020 dataset originated in a larger, more general dataset of election-related tweets the CIP collected using Twitter’s Streaming API for several months leading up to the election that November, a collection effort that continues at the CIP. This dataset included generic terms such as “vote,” “election,” “poll,” “ballot,” and “mail-in,” but also terms more specific to allegations of fraud and voter suppression. Additional emergent terms were added over time, including certain hashtags that were areas of focus for elections-related conversations online. In all, the CIP tracked more than 160 keywords, spread across numerous collectors (collection instances) to limit the impact of rate limits (around 50 tweets per second for each collector). For the period of interest, between Sept. 1, 2020 and Dec. 15, 2020, the ElectionMisinfo2020 dataset contains approximately 1.04 billion tweets.

The curated dataset of tweets related to misinformation stories, according to the co-authors, allows researchers to “carefully examine the factors that enable certain stories to ‘take off’ while others fade into online obscurity. Second, by exposing the pathways that specific misinformation stories take to national notoriety, we hope this database will also serve to provide insight into the characteristics of the accounts who amplify these stories along this chain.” 

And while the ElectionMisinfo2020 database allows for additional examination of the most active and influential accounts, “less is known about the accounts with less prominent profiles who often serve as a crucial link to repeat spreaders — a compelling question for future research.”  

The research at the focus of “Repeat spreaders and election delegitimization,” was funded by National Science Foundation grants 1715078, 1749815, and 2120496, the John S. and James L. Knight Foundation, the William and Flora Hewlett Foundation, Craig Newmark Philanthropies, The Omidyar Group, and the Eunice Kennedy Shriver National Institute of  Child Health and Human Development.


Citation: Kennedy, I., Wack, M., Beers, A., Schafer, J. S., Garcia-Camargo, I., Spiro, E. S., & Starbird, K. (2022). Repeat Spreaders and Election Delegitimization: A Comprehensive Dataset of Misinformation Tweets from the 2020 U.S. Election. Journal of Quantitative Description: Digital Media2. https://doi.org/10.51685/jqd.2022.013

Image at top: A visualization of a coengagement projection graph of election discourse (Figure 2 in the paper) where two accounts share a tie if at least 10,000 accounts retweet both accounts at least once.

Other News