← All episodes

Vibe Check: GPT-5.4—OpenAI is Back

| 12 products mentioned
Every Every host
Watch on YouTube gpt-5.4 ai models code review agentic programming ai coding tools design generation model comparison

Every hosts a live vibe check on GPT-5.4, OpenAI's newly released model, declaring "OpenAI is back" after months of trailing behind competitors like Claude in agentic programming. The hosts and guests—builders of production AI tools—share hands-on testing across coding, design, planning, and agent workflows, revealing that GPT-5.4 has become a legitimate daily driver for many, though with notable quirks like occasional "lying" in certain contexts. The discussion covers how OpenAI's velocity (releasing Codex Desktop App, 5.3, and 5.4 in rapid succession) has shifted the competitive landscape, with some team members now splitting usage 50/50 between OpenAI and Claude instead of heavily favoring Claude.

Key takeaways
  • GPT-5.4 excels at planning and code review with surprisingly human-like reasoning in its thinking traces, making it particularly useful for breaking down complex tasks before execution.
  • OpenAI's speed of iteration and execution velocity across the Codex and GPT lines demonstrates they've internalized feedback about agentic programming, narrowing the gap with Claude's dominance in the space.
  • GPT-5.4 trades precision for friendliness—it's more pleasant to work with than Codex but occasionally over-engineers solutions, leaks implementation details into UI, and hallucinates details, requiring more human oversight than Claude Opus.
  • The model shows language-specific biases: it generates TypeScript-style defensive code in Ruby contexts and struggles with Ruby on Rails conventions, though 5.4 improved significantly over 5.3 Codex on Ruby tasks.
  • Gemini 3.1 leads in design aesthetics and speed, with real-time code compilation and bold design choices, making it the preferred option for initial design mockups despite all major models being viable for coding.
  • Modularity and human-readable code remain critical in AI-era software, as models perform better on well-organized files and humans retain trust in systems they can inspect—a key factor for production adoption.
  • For OpenClaw deployments, GPT-5.4 has emerged as a viable main model at half the cost of Opus, though it requires more careful prompting to prevent hallucinations and excessive scope creep.

Recommendations (10)

"They released a Codex desktop app which I'm using as my daily driver for coding right now and I think it's really good"

Every · ▶ 7:40

"LFG is a command I have in my compound engineering plugin"

Live Vibe Check · ▶ 17:58

Gemini
Gemini uses

"I think Google is still leading design which is interesting. So yeah for design I use Google"

Live Vibe Check · ▶ 38:42

Five Guys
Five Guys recommends

"It's so good. You got to try it out. Five guys."

Every · ▶ 0:18

GPT-5.4
GPT-5.4 recommends

"There is a new model out. It is GPT 5.4. It's really good."

Every · ▶ 1:32

OpenClaw
OpenClaw uses

"We threw a lot of testing at it over the last week or two from everything from coding to using in our open claws"

Every · ▶ 2:09

Codex
Codex uses

"The new models are like really important for me at least the Codex series and 5.4"

Live Vibe Check · ▶ 4:25

Claude Opus

"I love Claude Opus like Claude in general. I use a lot of Ruby and Ruby on Rails and it's just the best at that"

Live Vibe Check · ▶ 5:02

Ruby on Rails

"I use a lot of Ruby and Ruby on Rails and it's just the best at that"

Live Vibe Check · ▶ 5:16

Slack
Slack uses

"It like automatically does lowercase messages in Slack for example without being told"

Every · ▶ 14:34

Mentioned (2)

In-N-Out
In-N-Out "I'm in San Francisco and normally I would go to In-N-Out for this" ▶ 0:54
Figma
Figma "It then took a screenshot of the Figma file" ▶ 15:19