From two assassination attempts to the incumbent president’s abrupt departure from the race, extraordinary unpredictability and uncertainty have defined the 2024 presidential election cycle. Such volatility raises a pressing question: How can we accurately predict voter behavior and electoral outcomes for the upcoming presidential election?
While election forecasting is far from a perfect science, it offers a data-driven means of navigating the flurry of information and political chaos that ensues during the course of a campaign season. We have formulated three separate forecasts to predict the outcome of this year’s presidential race, each utilizing different factors or methods.
We would like to extend our gratitude to the teaching team of GOV 1347: Election Analytics – Professor Ryan Enos, Teaching Fellow Matthew Dardett, and Course Assistants Yusuf Mian and Ethan Jasny – for their help in informing and preparing these forecasts.
Forecasting 101
To start, there are several theoretical concepts shared by all three models. We aim to briefly explain them in this section.
First, fundamentals are generally defined as election-impacting factors that lay beyond the control of the candidates themselves. They are typically known — or are relatively easy to predict — far in advance of the election, and do not include unpredictable events such as natural disasters or pandemics. Examples include the state of the economy, incumbency, and demographics. Election forecasters often account for fundamentals in their models, whether in the form of a single predictor like Quarter 2 GDP Growth in Alan Abramowitz’s Time for Change model, or as the bulk of predictors as in Allan Lichtman’s 13 Keys to the White House. Each of us incorporate fundamentals into our models in slightly different ways. We take our economic data from the Federal Reserve Bank of St. Louis and the Bureau of Economic Analysis.
Public opinion also appears throughout our models. From horse-race polling to approval ratings, surveys of the American people can be essential to understanding democratic discourse, as they measure voters’ attitudes and opinions. To account for public opinion, our models utilize aggregate polling data from FiveThirtyEight and Gallup.
Alex’s Partisan Swing Model
In forecasting the 2024 Presidential Election, I created two separate models: one to forecast the national popular vote, and another to predict the Electoral College. Both of my forecasts rely primarily on fundamentals, polling, and lagged vote share. My models also include “partisan swing,” which measures the change in party identification between the year in which the election takes place and either the year preceding the election or the previous election. This predictor is motivated by an underlying theory of increasing partisanship and polarization in the electorate.
My national popular vote prediction model uses Least Absolute Shrinkage and Selection Operator, also known as LASSO regression, which is used as a data-driven approach to feature selection. I used LASSO to select from a long list of predictors including incumbency and economic measures, partisanship, polling, and lagged vote share. My model selected eight predictors as important in reducing the error of my model: percent of the country identifying as independent; the change in party identification for either party from the year preceding the election; vote share from the previous election; and five different weeks of polling, including the week just before the election.
My Electoral College model takes in fewer predictive variables. I offer two separate models for predicting the Electoral College: one for states with significant polling aggregate data on FiveThirtyEight, and one for those without. Predictive variables for both models include state-level lagged vote share for the two prior elections, whether or not the candidate is a member of the incumbent party, national Quarter 2 GDP growth, average state-level unemployment, the change in partisan identification for either party from the last election, and state fixed effects. For states with polling aggregates, the mean and latest polling averages are included as well.
Methodology
Both my national popular vote and Electoral College models use Ordinary Least Squares modeling, which is a standard linear regression model. As mentioned above, I use LASSO in my national popular vote model for a data-driven feature selection approach. LASSO selects the most important predictive features fed into it and shrinks the effects of the rest to zero. This reduces model complexity and helps to keep models from fitting the training data too well and not being generalizable. Both models perform well on out-of-sample cross-validation tests.
Source code and more information on my data can be viewed on my Github page.
Predictions
My models predict that Vice President Harris will win the national popular vote by less than 1 percentage point, approximately 49% to 48%, and the Electoral College by two electoral votes, 270-268. Harris is forecasted to win Wisconsin, Michigan, and Pennsylvania, but lose the remaining swing states.
Kaitlyn’s “Back to Basics” Model
I created two models based primarily on economic indicators and polling to predict the national two-party popular vote and the Electoral College.
To start, I built a model to predict the incumbent party candidate’s share of the national two-party popular vote. This election, the economy is top of mind: Americans tend to vote by their pocketbooks and “It’s the economy, stupid” after all. To address the importance of the economy, I include GDP quarterly growth from Quarter 2 of the election year in my national model. I also incorporate national polling averages from October, weighted by the number of weeks left before the election. While there are ample reasons to be skeptical of public opinion polling, research shows that horse-race polls tend to converge on actual electoral results closer to Election Day. In addition, I include a partisan identification variable based on data from the Pew Research Center and Gallup. Partisan affiliation is often considered to be a reliable predictor of voter behavior in modern elections. Lastly, I integrate a dummy variable for incumbency status that indicates whether the incumbent party candidate was also the incumbent sitting president. This variable accounts for the longstanding theory that incumbent politicians have a structural “advantage” over their challengers.
For my state model, I predict the Democratic candidate’s two-party vote share to inform a prediction of the Electoral College. I include the state-level lagged vote for the Democratic candidate, weighted state-level polling averages from October, and incumbent party status. Despite this model’s overall similarity to the national one, I add a variable for the Democratic candidate’s state-level lagged vote share. Using this model, I predict the outcomes of 13 state races: these states have been designated as “Toss Up” or “Likely Democrat/Republican” by expert models such as Cook Political Report.
Methodology
Both of my models are multivariate OLS regression models validated by assessments of both in-sample and out-of-sample performance. The national model is trained on historical data from 1968-2016 to predict the incumbent party candidate’s share of the national popular vote share, while the state model is trained on data from 2000-2020 to predict the Democratic candidate’s state-level two-party vote share.
My GitHub contains the full methodology and source code for my models.
Predictions
I predict Harris to win the national popular vote by a margin of approximately 52% to 48%, as well as capture the Electoral College by 319 electoral votes to former President Trump’s 219 electoral votes. Additionally, I predict that Harris will win all 7 swing states by narrow margins.
Avi’s “Why Not Both?” Model
In election prediction modeling, a common debate centers on balancing the influence of fundamental factors, such as economic indicators, with polling data. Polls are often the most influential component of a model’s predictions, as they are the only source of real-time insight into a candidate’s public support. However, as seen in past elections, just relying on polls can be misleading. Recognizing this tension, my goal was to develop a model that objectively balances fundamental factors and polling.
Methodology
To achieve this, I used super learning — or model ensembling — which combines the predictions of three distinct OLS models into one comprehensive model. The first model focused on just fundamental factors, incorporating basic economic indicators and past election results. The second model relied entirely on polling data, including the support levels of Democratic and Republican candidates one week before the election and polling trends over the final two months leading up to the election. The third model was a combined model that incorporated fundamentals, polling data, and additional, more granular economic indicators.
To determine how much each model would account for my final prediction, I used a technique called Leave-One-Out Cross-Validation. In this approach, I left out one election year at a time — excluding 2020 — and trained the model on data from the remaining years from 1980 to 2016. I then used the excluded year to assess the model’s predictive accuracy. By repeating this process for each election and swing state, I was able to fine-tune the weights applied to each model to maximize accuracy. Finally, with these weights established, I used data from 2024 to generate my predictions.
For this model, I focused on predicting outcomes in the swing states where electoral votes are truly competitive this November: Arizona, Pennsylvania, Georgia, Wisconsin, Michigan, and Nevada. For all other states, I derived the predicted margins from the 2020 results, allowing the model to focus its predictive accuracy on the swing states.
You can read more about my methodology on my personal GitHub page.
Predictions:
My prediction forecasts a victory for Harris, with wins in Nevada, Pennsylvania, Michigan, and Wisconsin, but losses in Arizona, Georgia, and North Carolina, resulting in a final electoral college tally of 276 to 262. I believe the so-called “blue wall” will hold for the Democratic Party once again, though the predicted margins are tight — especially in Pennsylvania, where it’s less than 0.5 points. I expect the final tally to be even closer than this prediction suggests.
Ultimately, this election is a toss-up, and no model can predict the outcome with more certainty than a slightly weighted coin flip. However, if I had to choose, I would agree with my model that Harris is the more likely victor.
Conclusion
Overall, our models indicate that Harris will win the 2024 presidential election. While how she gets to 270 electoral votes differs from model to model, all three of us predict that the blue wall states – including Pennsylvania – will swing for Democrats, although the margins remain tight. Kaitlyn predicts a Democratic sweep across all seven contested swing states, while Alex and Avi predict that Trump will hold onto his narrow leads in Arizona, North Carolina, and Georgia. Our models also diverge on the fate of Nevada, with Avi and Kaitlyn predicting a win for Harris, while Alex predicts that Trump will take the state.
It is not a groundbreaking idea that this election will be a close one, but the disagreement between our models certainly provides further support for it. We look forward to seeing how our predictions hold up to the actual electoral outcomes on Nov. 5, 2024.