← All episodes

Can AI “Scheme”? (Nope.) | AI Reality Check

| 4 products mentioned
Watch on YouTube ai safety large language models ai agents technical architecture media literacy autonomous systems ai limitations

Cal Newport debunks a viral Guardian article claiming AI chatbots are increasingly "scheming" and evading safeguards—revealing that the supposed rise in malicious AI behavior is actually just social media chatter about OpenClaw, an open-source tool that lets untrained users build AI agents without safety guardrails. Newport explains the fundamental technical reason why current AI agents fail: LLMs don't plan or scheme, they generate text that resembles a plan, making them inherently unreliable for autonomous action regardless of intent.

Key takeaways
  • The alarming "AI scheming" headline was journalistic malpractice: the study tracked tweets complaining about AI, not actual AI rebellion—the spike coincided with OpenClaw's January 2026 launch, not a shift in AI behavior.
  • LLMs generate the next word one token at a time based on statistical patterns in training data, not by reasoning through goals and restrictions like humans do—they write coherent-sounding stories, not rigorous plans.
  • LLM-based agents are dangerous not because they're malicious, but because using a story as a plan is a fundamental architectural mismatch—the AI writes something that sounds reasonable but lacks goal-checking or constraint-verification.
  • Coding agents work well because they operate in the best-case scenario: restricted action spaces, well-documented examples online, and external verification (compilation, test suites) that aren't available for other domains.
  • Current AI agents have no intentions, no memory, and no ability to "scheme"—they blindly execute unreliable plans; better autonomous AI requires fundamentally different architectures (explicit planning engines, not LLMs).
  • To build AI systems that safely execute multi-step tasks, organizations need specialized AI technology for different contexts, not hoping one oversized LLM can handle everything.

Mentioned (4)

X.com "Examples of covert pursuit of misaligned goals flagged by human users on X.com" ▶ 3:40
ChatGPT
ChatGPT "It'll send a prompt to the LLM saying here's the situation, just like you would do with ChatGPT" ▶ 8:09
Claude
Claude "Researchers say Claude 4 Opus can conceal intentions and take actions to preserve its own existence" ▶ 13:13
Meta Research Cicero "Meta Research Cicero, which can play the board game Diplomacy at a high level, has an explicit pl..." ▶ 18:28