🎉 Join us at SRAI Annual Meeting 2025 - Visit Booth 114 & attend Brian Evans' AI in Research Development Talk! Learn more
September 16, 2025
The Science of When Good AI Will Go Bad
Please fill out this form to access the recorded webinar.
Summary of webinar by Professor Neil Johnson, George Washington University
As AI systems become increasingly integrated into our daily workflows—from committee meetings to financial decisions—a critical question emerges: how can we predict when reliable AI output suddenly turns unreliable? Professor Neil Johnson's recent research reveals a mathematical framework for understanding exactly when and why AI systems reach their "tipping points," transitioning from helpful to harmful output.
At the heart of every transformer-based AI system (GPT, Claude, Gemini) lies what Johnson describes as a "compass needle"—a mathematical direction that guides the AI's predictions. This internal compass develops as the system processes information, balancing what it has learned during training against the current input context.
Johnson's team has developed methods to track where this compass points throughout an AI's reasoning process. The key insight: the compass needle turns before the output turns bad. This means tipping points are mathematically predictable and potentially preventable.
The financial impact of AI's unreliable output is staggering. Johnson cites an estimated $67 billion in damages from AI tipping points in the past year alone, including:
The challenge isn't that AI produces obviously wrong answers—it's that the output looks and sounds plausible while being factually incorrect or potentially damaging.
Today's AI safety measures are largely "post-fact"—they try to catch bad output after it's already been generated. Johnson compares this to "trying to stop a plane crash after it's crashed." By the time current guardrails detect problematic output, users may have already acted on incorrect information.
Johnson's research offers a proactive approach. By monitoring the AI's internal "compass needle," systems could:
This is particularly relevant for "agentic AI"—systems that work autonomously on complex tasks without constant human prompting. As Johnson explains, agentic AI acts like a sophisticated personal assistant that can synthesize documents, schedule meetings, and generate strategic recommendations while you're "off at the dentist."
The legal and liability questions surrounding AI use in institutions remain largely unresolved. When a university administrator uses AI for committee work or a bank deploys AI for customer service, who bears responsibility when the system provides incorrect information? Johnson argues that understanding tipping points provides a framework for due diligence that organizations desperately need.
The mathematical framework draws from physics, particularly theories about how systems coagulate and develop cohesion. While Johnson avoided overwhelming his audience with mathematical details, he emphasized that these tipping points are deterministic—they follow predictable patterns that can be modeled and anticipated.
Johnson's work represents a shift from reactive to predictive AI safety. Rather than simply filtering output after generation, future systems could monitor their internal state and provide transparency about their confidence and reliability in real-time.
As one audience member noted, current AI systems increasingly train on AI-generated content, creating potential feedback loops. Johnson acknowledges this concern but argues that institutions have always operated with some degree of self-referential bias—the key is understanding and accounting for these biases in the mathematical models.
Johnson draws an analogy to commercial aviation: most people trust flying not because they understand aerodynamics, but because they understand that someone has ensured lift exceeds gravity. Similarly, AI adoption requires what he calls "the paper plane of AI"—simple, demonstrable proof that the fundamental forces pushing AI toward good outcomes outweigh those pushing toward bad ones.
This webinar was hosted by Atom Grants, which develops AI tools for research funding discovery while implementing evaluation systems to catch AI failures before they reach users.