GSPM researchers use social media data to predict 2020 primary outcomes

Media Credit: Akash Pamarthy | Photographer

Michael Cornfield is the director of the Public Echoes of Rhetoric in America Project, which uses social media reactions to politicians to predict the presidential election results.

Updated: Feb. 27, 2020 at 11:08 a.m.

A team of researchers is using social media to predict the winners of the 2020 Democratic presidential primary contests in the first-ever model of its kind.

Members of the Public Echoes of Rhetoric in America Project, which focuses on how the public responds to the messages employed by politicians, started releasing projections earlier this month as residents of Iowa and New Hampshire took to the polls to elect the next Democratic nominee for president. Michael Cornfield, the project’s director, said the team’s model will be updated as more data becomes available through the election season.

Cornfield said Lara Brown, the director of the Graduate School of Political Management, first proposed the idea to create the model.

“She’s very interested in building a predictive model that incorporates some of the data that we collect,” he said.

He said the researchers examined several possible social media metrics to factor into their model but ultimately decided to focus on a candidate’s mentions on Twitter, a social media network members of the PEORIA Project have studied in previous projects.

He said the report on the upcoming South Carolina primary incorporates data from the New Hampshire primary and Nevada caucuses, which took place Feb. 11 and Feb. 22, respectively, to improve the predictive ability of the model. Cornfield said researchers chose not to use results from the Iowa caucuses because of technical issues with an app used to collect the results, which necessitated a recount.

Cornfield said the researchers were not surprised that U.S. Sen. Bernie Sanders, I-Vt., has won the early contests but added that his team was not expecting the margin of victory he achieved.

“We predicted that Sanders would win – most people did,” he said. “It’s the margin that has altered the political universe.”

The model incorrectly predicted the Iowa caucuses for former Vice President Joe Biden and correctly forecasted Sanders would win in New Hampshire.

Cornfield said he expects Sanders to do well on Super Tuesday, the first Tuesday in March in which 14 states hold primaries and caucuses, based on the model’s indicators and previous results. He said the model’s usefulness in predicting contests beyond that date is limited because it “remains to be seen” who will drop out of the race after the South Carolina primary Saturday.

The model’s South Carolina report projects Sanders to win the primary.

He said he is “reluctant” to make a prediction about who will win the nomination because Sanders, the frontrunner, must convince those who historically have not turned out to vote to continue to show up.

“It’s the fact that he was able to draw on so many young people and on Hispanic voters,” Cornfield said of Sanders’ results so far. “What we still don’t know is how many of those people would not have voted had he not been in the race.”

Meagan O’Neill, the lead research scientist for the project, said the model is the first one to predict individual state primary results by using “targeted” Twitter data, allowing the researchers to attempt to make predictions without the use of any polling data.

O’Neill said the model uses a candidate’s share of Twitter mentions relative to the party, their share of cash at hand based on the latest available data from the Federal Election Commission, endorsements and previous primary results, where appropriate.

She said she is responsible for transcribing information from the FEC’s website, FiveThirtyEight’s endorsement tracker and Crimson Hexagon, an artificial intelligence-powered platform that allows researchers to analyze audiences, brand perception and trends, according to its website.

O’Neill said she and Cornfield, GSPM Director Lara Brown and Todd Belt – the director of the political management program – are responsible for writing and releasing the reports themselves.

She added that researchers will update the model after it makes prediction errors by taking into account the reasons for the errors, so the model will serve as a better predictor in the future.

O’Neill said the team will have to build a different model for the general election after Democrats select a candidate to compete against President Donald Trump because there are different “fundamental” variables for a general election compared to primaries.

Belt, the political management program director, declined to comment, deferring to Cornfield.

Ciara Regan contributed reporting.

This post has been updated to reflect the following correction:
An earlier version of this post misreported the name of the project responsible for the election model. The name has been fixed. We regret this error.

The Hatchet has disabled comments on our website. Learn more.