AI Job Predictions Unreliable as 3 Models Disagree on Risks

A new study reveals the very AI models predicting job displacement can't agree on which jobs they are coming for, disagreeing up to 25% of the time.

A new study has found that the world’s leading artificial intelligence models produce varied and often conflicting predictions on which jobs are most exposed to automation, raising questions about the reliability of AI-driven economic forecasting. The working paper, published by the National Bureau of Economic Research (NBER), highlights the uncertainty in a field that has become a pressing concern for policymakers and workers alike, showing disagreement between the top 3 models on job risk.

"I personally would not rely on just one measure to say, ‘Oh, I should change my job,’ or ‘I should change my kid’s major,’” said Michelle Yin of Northwestern University, one of the study's authors. The research suggests that while AI is being used to predict its own impact, the results are far from consistent, urging caution against taking these forecasts at face value.

The study, co-authored by Yin, Hoa Vu of Northwestern University, and Claudia Persico of American University, examined the job exposure rankings from three major AI models: OpenAI’s ChatGPT-5, Google Deepmind’s Gemini 2.5, and Anthropic’s Claude 4.5. For example, Claude rated accountants as highly vulnerable to AI, whereas Gemini assigned them a much lower risk. The models also disagreed on the vulnerability of roles like advertising managers and chief executives.

The findings present a challenge for investors and companies relying on AI-generated "exposure scores" for strategic workforce planning. With models like ChatGPT and Gemini disagreeing about a quarter of the time, the study suggests that the current generation of AI may be reflecting existing biases in adoption rather than providing a clear-eyed view of future disruption.

Models in Disagreement

The core of the research involved feeding the AI models tasks from the Labor Department's database to see which ones they could perform. The economists found that the level of agreement between the models was surprisingly low. While ChatGPT and Gemini were the most aligned, they still offered different assessments on a significant portion of occupations.

This divergence is critical because these exposure scores are increasingly used in consultancy white papers, research notes, and policy reports to guide decisions on workforce training and support. The study posits that some of these differences may stem from the models' training data; early adopters in fields like financial analysis generate more AI-related data, which could in turn lead the models to rank those professions as more exposed.

Implications for Workforce Planning

The unreliability of these AI-generated scores has significant implications. Policymakers attempting to design support systems for displaced workers and educational institutions advising students on "AI-proof" careers may be operating with flawed data. The economists suggest that instead of relying on a single AI model, researchers should look across a variety of models and be more transparent about the uncertainty of the predictions.

For investors, the study serves as a caution against making sector-wide bets based on simplistic AI-takeover narratives. The lack of consensus among AI models suggests the real-world impact on labor markets will be more nuanced and harder to predict than many reports suggest. The true exposure of any given job depends less on a model's theoretical capabilities and more on how AI is actually implemented across the economy, a process that requires more robust surveying and human-in-the-loop analysis.

This article is for informational purposes only and does not constitute investment advice.