Vibe Check: GPT-5.4—OpenAI is Back
Every hosts a live vibe check on GPT-5.4, OpenAI's newly released model, declaring "OpenAI is back" after months of trailing behind competitors like Claude in agentic programming. The hosts and guests—builders of production AI tools—share hands-on testing across coding, design, planning, and agent workflows, revealing that GPT-5.4 has become a legitimate daily driver for many, though with notable quirks like occasional "lying" in certain contexts. The discussion covers how OpenAI's velocity (releasing Codex Desktop App, 5.3, and 5.4 in rapid succession) has shifted the competitive landscape, with some team members now splitting usage 50/50 between OpenAI and Claude instead of heavily favoring Claude.
Key takeaways
- • GPT-5.4 excels at planning and code review with surprisingly human-like reasoning in its thinking traces, making it particularly useful for breaking down complex tasks before execution.
- • OpenAI's speed of iteration and execution velocity across the Codex and GPT lines demonstrates they've internalized feedback about agentic programming, narrowing the gap with Claude's dominance in the space.
- • GPT-5.4 trades precision for friendliness—it's more pleasant to work with than Codex but occasionally over-engineers solutions, leaks implementation details into UI, and hallucinates details, requiring more human oversight than Claude Opus.
- • The model shows language-specific biases: it generates TypeScript-style defensive code in Ruby contexts and struggles with Ruby on Rails conventions, though 5.4 improved significantly over 5.3 Codex on Ruby tasks.
- • Gemini 3.1 leads in design aesthetics and speed, with real-time code compilation and bold design choices, making it the preferred option for initial design mockups despite all major models being viable for coding.
- • Modularity and human-readable code remain critical in AI-era software, as models perform better on well-organized files and humans retain trust in systems they can inspect—a key factor for production adoption.
- • For OpenClaw deployments, GPT-5.4 has emerged as a viable main model at half the cost of Opus, though it requires more careful prompting to prevent hallucinations and excessive scope creep.
Recommendations (10)
"They released a Codex desktop app which I'm using as my daily driver for coding right now and I think it's really good"
Every · ▶ 7:40
"LFG is a command I have in my compound engineering plugin"
Live Vibe Check · ▶ 17:58
"I love Claude Opus like Claude in general. I use a lot of Ruby and Ruby on Rails and it's just the best at that"
Live Vibe Check · ▶ 5:02
"I use a lot of Ruby and Ruby on Rails and it's just the best at that"
Live Vibe Check · ▶ 5:16
Mentioned (2)
More from these creators
Most SaaS Companies Got AI Wrong. Linear Waited.
Building Is the Easy Part Now | Mike Krieger on What AI Changed
What Happens When Beginners Start Building With Claude Code—With Mike Taylor and Kate Lee
Reviewing Everything on my Desk! (2026)
How We Use Proof, a Collaborative Editor for Humans and AI
Meet the Slowest Startup Incubator in the World—Pumping Out Billion-dollar Companies