← All episodes

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

| 104 products mentioned
Watch on YouTube large language models ai competition us vs china scaling laws inference optimization post-training reinforcement learning open-source ai data quality for training

Lex Fridman discusses the state of AI in early 2026 with machine learning researchers Sebastian Raschka and Nathan Lambert, examining the competitive landscape between US and Chinese AI labs, the evolution of LLM architectures, and the multiple dimensions of scaling laws. The conversation unpacks how despite fundamental architectural similarities to GPT-2, modern LLMs achieve dramatic capability improvements through advances in post-training, inference scaling, data quality, and systems optimization rather than architectural breakthroughs.

Key takeaways
  • The competitive AI landscape is characterized by resource constraints and organizational execution rather than proprietary technological access, as ideas flow freely between labs through researcher mobility.
  • Scaling laws remain robust across pre-training, post-training, and inference dimensions, though the most attractive gains in 2026 come from inference-time scaling and reinforcement learning rather than simply training larger models.
  • Tool use capabilities—enabling LLMs to call APIs, search the web, and execute code—represent a major unlock that's still underutilized in open-source models and require containerization for safe deployment.
  • Data quality and curation matter more than raw data quantity, with techniques like synthetic data generation, OCR of PDFs, and strategic source mixing proving more efficient than simply scaling token counts.
  • Chinese open-weight models like DeepSeek are gaining adoption not primarily through superior performance but through unrestricted licensing, cost efficiency, and local deployment options, posing strategic challenges to US API-based business models.
  • Mixture of Experts (MoE) architectures and attention mechanism refinements like multi-head latent attention and group query attention enable efficient scaling without proportional compute increases, driving recent open-source model improvements.

Recommendations (27)

ChatGPT
ChatGPT uses

"This is why I like the ChatGPT app, because it gives the AI a home on your computer where you can focus on it, rather than just being another tab in my mess of internet options."

Nathan Lambert · ▶ 5:39

Nvidia
Nvidia uses

"Even back when I was a grad student, I was in a lab doing biophysical simulations, molecular dynamics, and we had a Tesla GPU back then just for the computations. It was about 15 years ago now."

Sebastian Raschka · ▶ 4:00:49

Build a Large Language Model from Scratch

"First is Build a Large Language Model from Scratch and Build a Reasoning Model from Scratch. I truly believe in the machine learning world, the best way to learn and understand something is to buil..."

Lex Fridman · ▶ 0:53

VS Code
VS Code uses

"So, I use the Codeium plugin for VS Code."

Sebastian Raschka · ▶ 21:59

"First is Build a Large Language Model from Scratch and Build a Reasoning Model from Scratch. I truly believe in the machine learning world, the best way to learn and understand something is to buil..."

Lex Fridman · ▶ 0:53

ChatGPT Pro

"I will regularly have like five Pro queries going simultaneously, each looking for one specific paper or feedback on an equation or something."

Nathan Lambert · ▶ 15:32

Claude Code

"And then for code and any sort of philosophical discussion, I use Claude Opus 4.5. Also always with extended thinking."

Nathan Lambert · ▶ 17:05

Grok
Grok uses

"And then sometimes use Grok for real-time information or finding something on AI Twitter that I knew I saw and I need to dig up."

Nathan Lambert · ▶ 17:20

Codeium
Codeium uses

"So, I use the Codeium plugin for VS Code. You know, it's very convenient. It's just like a plugin, and then it's a chat interface that has access to your repository."

Sebastian Raschka · ▶ 21:59

Cursor
Cursor uses

"I use basically half-and-half Cursor and Claude Code, because they're fundamentally different experiences and both are useful."

Lex Fridman · ▶ 21:46

Perplexity
Perplexity uses

"I should say, going to Perplexity here, Sebastian Raschka is a machine learning researcher and author known for several influential books."

Lex Fridman · ▶ 24:05

ChatGPT
ChatGPT recommends

"So I suggested, 'Hey, let's try ChatGPT.' We copied the text into ChatGPT, and it fixed them. Instead of two hours going from link to link fixing that, it made that type of work much more seamless."

Sebastian Raschka · ▶ 1:33:29

Qwen 3 uses

"I can give you also a hands-on example. I was training the Qwen 3 base model with RLVR on MATH-500. The base model had an accuracy of about 15%. Just 50 steps, like in a few minutes with RLVR, the ..."

Sebastian Raschka · ▶ 1:44:16

MATH-500 uses

"I was training the Qwen 3 base model with RLVR on MATH-500. The base model had an accuracy of about 15%."

Sebastian Raschka · ▶ 1:44:20

OLMo
OLMo uses

"What I would recommend doing, or what I also do, is if I want to understand, for example, how OLMo is implemented, I would look at the weights in the model hub, the config file, and then you can se..."

Sebastian Raschka · ▶ 2:01:42

OLMo 3 uses

"Sometimes it takes me a day. With OLMo 3, the challenge was RoPE for the position embeddings. They had a YaRN extension and there was some custom scaling there, and I couldn't quite match these thi..."

Sebastian Raschka · ▶ 2:02:21

Zelda uses

"Sometimes for pastime I play video games, like I like- Video games with puzzles, like Zelda and Metroid."

Sebastian Raschka · ▶ 2:11:30

Metroid uses

"Sometimes for pastime I play video games, like I like- Video games with puzzles, like Zelda and Metroid."

Sebastian Raschka · ▶ 2:11:34

Season of the Witch recommends

"I need to get him a copy of Season of the Witch, which is a history of SF from 1960 to 1985, which goes through the hippie revolution, like all the gays taking over the city and that culture emergi..."

Nathan Lambert · ▶ 2:28:05

Exa
Exa uses

"I don't know, Exa is my preferred search provider, but somebody else might care for a different search startup."

Nathan Lambert · ▶ 2:37:20

Exa
Exa uses

"Exa is my preferred search provider"

Sebastian Raschka · ▶ 2:37:20

Claude Code

"I try Claude Code on the web every three to six months, which is just prompting a model to make an update to some GitHub repository that I have"

Sebastian Raschka · ▶ 2:37:55

"The Recursive Language Model paper, that is one of the papers that tries to kind of address the long context thing"

Sebastian Raschka · ▶ 2:46:45

Cursor Composer

"I should say I use Composer a lot because one of the benefits it has is that it's fast"

Sebastian Raschka · ▶ 3:39:38

Tesla GPU uses

"We had a Tesla GPU back then just for the computations. It was about 15 years ago now"

Sebastian Raschka · ▶ 4:00:53

"I used that feature before, and I always feel bad because it does that every day, and I rarely check it out"

Sebastian Raschka · ▶ 3:21:02

Gemini
Gemini uses

"Gemini 3 is a fantastic model, and I still use it. It's just kind of differentiation is lower."

Nathan Lambert · ▶ 4:52

Mentioned (77)

Amazon Trainium
Amazon Trainium "Amazon is making Trainium" ▶ 4:02:30
Grok 4 Heavy "Although when Grok 4 came out, the Grok 4 SuperGrok Heavy, which was like their pro variant was a..." ▶ 17:31
Reinforcement Learning from Human Feedback "Nathan is the post-training lead at the Allen Institute for AI, author of the definitive book on ..." ▶ 1:21
DeepSeek R1
DeepSeek R1 "This happened about a year ago in January 2025, when the open-weight Chinese company DeepSeek rel..." ▶ 2:05
Claude Opus 4.5
Claude Opus 4.5 "The hype over Anthropic's Claude Opus 4.5 model has been absolutely insane, which is just... I me..." ▶ 4:08
Z.ai GLM models "The likes of Z.ai with their GLM models, Minimax's models, Kimi Moonshot, especially in the last ..." ▶ 5:56
Minimax "The likes of Z.ai with their GLM models, Minimax's models, Kimi Moonshot, especially in the last ..." ▶ 5:56
Kimi Moonshot "The likes of Z.ai with their GLM models, Minimax's models, Kimi Moonshot, especially in the last ..." ▶ 5:56
ChatGPT memory feature
ChatGPT memory feature "ChatGPT has a memory feature, right? And so you may have a subscription and you use it for person..." ▶ 10:02
GPT-5
GPT-5 "Personally, I have very mixed reviews of GPT-5, but it must have saved them so much money with th..." ▶ 11:26
Deep Research "Like Deep Research, Sora, o1 thinking models—all these definitional things have come from OpenAI." ▶ 13:13
Sora
Sora "Like Deep Research, Sora, o1 thinking models—all these definitional things have come from OpenAI." ▶ 13:13
o1 thinking models
o1 thinking models "Like Deep Research, Sora, o1 thinking models—all these definitional things have come from OpenAI." ▶ 13:13
Google TPUs
Google TPUs "Largely because the margin on NVIDIA chips is insane, and Google can develop everything from top ..." ▶ 12:45
Nvidia
Nvidia "Largely because the margin on NVIDIA chips is insane, and Google can develop everything from top ..." ▶ 12:45
Hugging Face
Hugging Face "On my blog, we scrape Hugging Face so we keep download numbers for every dataset and model over t..." ▶ 28:02
Mistral AI
Mistral AI "Let's throw in Mistral AI, Gemma..." ▶ 29:01
Gamma
Gamma "Let's throw in Mistral AI, Gemma..." ▶ 29:01
gpt-oss-120b "gpt-oss, the open weight model by OpenAI... gpt-oss-120b is actually a very strong model and does..." ▶ 29:05
NVIDIA Nemotron 3 "Actually, NVIDIA had a really cool one, Nemotron 3." ▶ 29:05
Qwen "Qwen might be the one— Oh, yeah. Qwen was the obvious name I was gonna say." ▶ 29:12
GPT-2 "When I was writing about OpenAI's open model release, they were like, 'Don't forget about GPT-2,'..." ▶ 29:26
SmolLM "Hugging Face has SmolLM, which is very popular." ▶ 30:07
OpenRouter
OpenRouter "With OpenRouter, it's easy to look at multi-model things. You can run DeepSeek on Perplexity." ▶ 20:25
Substack
Substack "For example, if you read a Substack article, I could maybe ask an LLM to give me opinions on that..." ▶ 1:21:04
Bing Sydney "I would love to have tried Bing Sydney. Did that have more voice? Because it would so often go of..." ▶ 1:24:07
GPT-4o
GPT-4o "There was a lot of backlash last year with GPT-4o getting removed, and I've personally never used..." ▶ 1:24:35
TikTok
TikTok "We see this with TikTok. You open it... I don't use TikTok, but supposedly in five minutes the al..." ▶ 1:25:05
Anthropic
Anthropic "A lot of researchers at these companies are so well-motivated, and definitely Anthropic and OpenA..." ▶ 1:26:50
OpenAI
OpenAI "A lot of researchers at these companies are so well-motivated, and definitely Anthropic and OpenA..." ▶ 1:26:50
Spotify
Spotify "my wife the other day—she has a podcast for book discussions, a book club, and she was transferri..." ▶ 1:33:10
Claude Code
Claude Code "For me personally, since we're talking about coding, and you mentioned debugging... the source of..." ▶ 1:33:58
Constitutional AI "That's the older term for it coined in Anthropic's Constitutional AI paper." ▶ 1:41:14
MMLU "even something simpler like MMLU, which is a multiple-choice benchmark. If you just change the fo..." ▶ 1:46:50
OpenAI o1 "I think you can kind of take this in order. I think you could view it as what made o1, which is t..." ▶ 1:47:43
GRPO "If we look at the GRPO equation, this one is famous for this because essentially the reward given..." ▶ 1:48:49
Scale-RL "I think there's a seminal paper from a Meta internship. It's called something like 'The Art of Sc..." ▶ 1:57:37
Hugging Face Transformers "When you code these from scratch, you can take an existing model from the Hugging Face Transforme..." ▶ 2:00:16
SGLang "even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..." ▶ 2:01:07
vLLM
vLLM "even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..." ▶ 2:01:11
RoPE "With OLMo 3, the challenge was RoPE for the position embeddings. They had a YaRN extension and th..." ▶ 2:02:25
YaRN "They had a YaRN extension and there was some custom scaling there, and I couldn't quite match the..." ▶ 2:02:29
Direct Preference Optimization "the famous paper, Direct Preference Optimization, which is a much simpler way of solving the prob..." ▶ 2:10:12
LoRA "For the character training thing, I think this research is built on fine-tuning about 7 billion p..." ▶ 2:13:43
Claude
Claude "But if you go from a small university with no compute and find something that Claude struggles wi..." ▶ 2:14:33
Stable Diffusion "And listeners may know diffusion models from image generation, like Stable Diffusion popularized it." ▶ 2:29:48
GANs "There was a paper on generating images. Back then, people used GANs, Generative Adversarial Netwo..." ▶ 2:29:56
BERT "It's kind of similar to the BERT models by Google. Like, when you go back to the original transfo..." ▶ 2:30:23
Gemini Diffusion "But there was an announcement by Google, a site where they said they are launching Gemini Diffusi..." ▶ 2:32:32
Gemini Nano 2 "they put it into context of their Gemini Nano 2 model" ▶ 2:32:40
Apple Foundation Models "Like what Apple tried to do with the Apple Foundation models, putting them on the phone, where th..." ▶ 2:42:25
GPT-5.2 Pro "If you think about GPT-5.2 Pro taking an hour, it's like, what if your training run has a sample ..." ▶ 1:52:18
DeepSeek-V3.2 "DeepSeek-V3.2, where they had a sparse attention mechanism where they have essentially a very eff..." ▶ 2:48:56
World Models "There was a paper by Meta, a paper called World Models. So where they basically apply the concept..." ▶ 2:52:03
CASP "There is a competition called CASP, I think, where they do protein structure prediction" ▶ 2:52:51
AlphaFold
AlphaFold "AlphaFold, when it came out, it crushed this benchmark" ▶ 2:53:15
RTX "There's some work in this area like RTX, I think it was a few years ago, where people are startin..." ▶ 2:55:45
AI2027 report "I don't know if you like the originally titled AI27 report. They focus more on code and research ..." ▶ 3:01:14
Harmonic "I think there are startups—maybe Harmonic is one—where they're going all in on language models pl..." ▶ 3:14:23
Lean
Lean "language models plus Lean for math" ▶ 3:14:27
Slack
Slack "You want to add a new tab in Slack that you want to use, and I think AI will be able to do that p..." ▶ 3:09:02
Microsoft Word
Microsoft Word "take something like Slack or Microsoft Word. I think if organizations allow it, AI could very eas..." ▶ 3:08:45
Chrome
Chrome "If you look at the browser, Chrome. If I wanted to add a feature, if I wanted to have tabs as opp..." ▶ 3:10:09
Reflection AI "We hear about Reflection AI, where they say their two billion dollar fundraise is dedicated to bu..." ▶ 3:52:52
Black Forest Labs "They're signing licensing deals with Black Forest Labs, which is an image generation company" ▶ 3:43:38
Midjourney
Midjourney "signing licensing deals with Black Forest Labs, which is an image generation company, or Midjourney" ▶ 3:43:42
Groq
Groq "We are starting to see some types of consolidation with Groq for $20 billion" ▶ 3:36:43
Scale AI
Scale AI "Scale AI for almost $30 billion and countless other deals like this" ▶ 3:36:46
Perplexity
Perplexity "I think there will be some other multi-billion dollar acquisitions, like Perplexity" ▶ 3:38:34
Vera Rubin
Vera Rubin "That's why part of what Vera Rubin is- where they have a new chip with no high-bandwidth memory, ..." ▶ 4:01:51
CUDA
CUDA "The moat of NVIDIA is probably not just the GPU. It's more like the CUDA ecosystem, and that has ..." ▶ 4:00:42
AlexNet "I think it only happened because you could purchase those GPUs." ▶ 4:06:26
Transformer "The word 'transformer' could still be known. I would guess that deep learning is definitely still..." ▶ 4:12:08