Followed

Trending Products Creators Episodes Topics

Trending Products Creators Episodes Topics

← All episodes

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

January 31, 2026 | 104 products mentioned

Lex Fridman

Lex Fridman host

Sebastian Raschka guest

Watch on YouTube large language models ai competition us vs china scaling laws inference optimization post-training reinforcement learning open-source ai data quality for training

Lex Fridman discusses the state of AI in early 2026 with machine learning researchers Sebastian Raschka and Nathan Lambert, examining the competitive landscape between US and Chinese AI labs, the evolution of LLM architectures, and the multiple dimensions of scaling laws. The conversation unpacks how despite fundamental architectural similarities to GPT-2, modern LLMs achieve dramatic capability improvements through advances in post-training, inference scaling, data quality, and systems optimization rather than architectural breakthroughs.

Key takeaways

• The competitive AI landscape is characterized by resource constraints and organizational execution rather than proprietary technological access, as ideas flow freely between labs through researcher mobility.
• Scaling laws remain robust across pre-training, post-training, and inference dimensions, though the most attractive gains in 2026 come from inference-time scaling and reinforcement learning rather than simply training larger models.
• Tool use capabilities—enabling LLMs to call APIs, search the web, and execute code—represent a major unlock that's still underutilized in open-source models and require containerization for safe deployment.
• Data quality and curation matter more than raw data quantity, with techniques like synthetic data generation, OCR of PDFs, and strategic source mixing proving more efficient than simply scaling token counts.
• Chinese open-weight models like DeepSeek are gaining adoption not primarily through superior performance but through unrestricted licensing, cost efficiency, and local deployment options, posing strategic challenges to US API-based business models.
• Mixture of Experts (MoE) architectures and attention mechanism refinements like multi-head latent attention and group query attention enable efficient scaling without proportional compute increases, driving recent open-source model improvements.

Recommendations (27)

ChatGPT

ChatGPT uses

"This is why I like the ChatGPT app, because it gives the AI a home on your computer where you can focus on it, rather than just being another tab in my mess of internet options."

Nathan Lambert · ▶ 5:39

Nvidia

Nvidia uses

"Even back when I was a grad student, I was in a lab doing biophysical simulations, molecular dynamics, and we had a Tesla GPU back then just for the computations. It was about 15 years ago now."

Sebastian Raschka · ▶ 4:00:49

Check price →

Build a Large Language Model from Scratch

Build a Large Language Model from Scratch recommends

"First is Build a Large Language Model from Scratch and Build a Reasoning Model from Scratch. I truly believe in the machine learning world, the best way to learn and understand something is to buil..."

Lex Fridman · ▶ 0:53

Check price →

VS Code

VS Code uses

"So, I use the Codeium plugin for VS Code."

Sebastian Raschka · ▶ 21:59

Build a Reasoning Model from Scratch recommends

"First is Build a Large Language Model from Scratch and Build a Reasoning Model from Scratch. I truly believe in the machine learning world, the best way to learn and understand something is to buil..."

Lex Fridman · ▶ 0:53

Check price →

ChatGPT Pro

ChatGPT Pro uses

"I will regularly have like five Pro queries going simultaneously, each looking for one specific paper or feedback on an equation or something."

Nathan Lambert · ▶ 15:32

Claude Code

Claude Code uses

"And then for code and any sort of philosophical discussion, I use Claude Opus 4.5. Also always with extended thinking."

Nathan Lambert · ▶ 17:05

Grok

Grok uses

"And then sometimes use Grok for real-time information or finding something on AI Twitter that I knew I saw and I need to dig up."

Nathan Lambert · ▶ 17:20

Codeium

Codeium uses

"So, I use the Codeium plugin for VS Code. You know, it's very convenient. It's just like a plugin, and then it's a chat interface that has access to your repository."

Sebastian Raschka · ▶ 21:59

Cursor

Cursor uses

"I use basically half-and-half Cursor and Claude Code, because they're fundamentally different experiences and both are useful."

Lex Fridman · ▶ 21:46

Perplexity

Perplexity uses

"I should say, going to Perplexity here, Sebastian Raschka is a machine learning researcher and author known for several influential books."

Lex Fridman · ▶ 24:05

ChatGPT

ChatGPT recommends

"So I suggested, 'Hey, let's try ChatGPT.' We copied the text into ChatGPT, and it fixed them. Instead of two hours going from link to link fixing that, it made that type of work much more seamless."

Sebastian Raschka · ▶ 1:33:29

Qwen 3 uses

"I can give you also a hands-on example. I was training the Qwen 3 base model with RLVR on MATH-500. The base model had an accuracy of about 15%. Just 50 steps, like in a few minutes with RLVR, the ..."

Sebastian Raschka · ▶ 1:44:16

MATH-500 uses

"I was training the Qwen 3 base model with RLVR on MATH-500. The base model had an accuracy of about 15%."

Sebastian Raschka · ▶ 1:44:20

OLMo

OLMo uses

"What I would recommend doing, or what I also do, is if I want to understand, for example, how OLMo is implemented, I would look at the weights in the model hub, the config file, and then you can se..."

Sebastian Raschka · ▶ 2:01:42

OLMo 3 uses

"Sometimes it takes me a day. With OLMo 3, the challenge was RoPE for the position embeddings. They had a YaRN extension and there was some custom scaling there, and I couldn't quite match these thi..."

Sebastian Raschka · ▶ 2:02:21

Zelda uses

"Sometimes for pastime I play video games, like I like- Video games with puzzles, like Zelda and Metroid."

Sebastian Raschka · ▶ 2:11:30

Metroid uses

"Sometimes for pastime I play video games, like I like- Video games with puzzles, like Zelda and Metroid."

Sebastian Raschka · ▶ 2:11:34

Season of the Witch recommends

"I need to get him a copy of Season of the Witch, which is a history of SF from 1960 to 1985, which goes through the hippie revolution, like all the gays taking over the city and that culture emergi..."

Nathan Lambert · ▶ 2:28:05

Check price →

Exa

Exa uses

"I don't know, Exa is my preferred search provider, but somebody else might care for a different search startup."

Nathan Lambert · ▶ 2:37:20

Exa

Exa uses

"Exa is my preferred search provider"

Sebastian Raschka · ▶ 2:37:20

Claude Code

Claude Code uses

"I try Claude Code on the web every three to six months, which is just prompting a model to make an update to some GitHub repository that I have"

Sebastian Raschka · ▶ 2:37:55

Recursive Language Model recommends

"The Recursive Language Model paper, that is one of the papers that tries to kind of address the long context thing"

Sebastian Raschka · ▶ 2:46:45

Cursor Composer

Cursor Composer uses

"I should say I use Composer a lot because one of the benefits it has is that it's fast"

Sebastian Raschka · ▶ 3:39:38

Tesla GPU uses

"We had a Tesla GPU back then just for the computations. It was about 15 years ago now"

Sebastian Raschka · ▶ 4:00:53

Check price →

ChatGPT Pulse uses

"I used that feature before, and I always feel bad because it does that every day, and I rarely check it out"

Sebastian Raschka · ▶ 3:21:02

Gemini

Gemini uses

"Gemini 3 is a fantastic model, and I still use it. It's just kind of differentiation is lower."

Nathan Lambert · ▶ 4:52

Mentioned (77)

Amazon Trainium

Amazon Trainium "Amazon is making Trainium" ▶ 4:02:30

Grok 4 Heavy "Although when Grok 4 came out, the Grok 4 SuperGrok Heavy, which was like their pro variant was a..." ▶ 17:31

Reinforcement Learning from Human Feedback "Nathan is the post-training lead at the Allen Institute for AI, author of the definitive book on ..." ▶ 1:21

DeepSeek R1

DeepSeek R1 "This happened about a year ago in January 2025, when the open-weight Chinese company DeepSeek rel..." ▶ 2:05

Claude Opus 4.5

Claude Opus 4.5 "The hype over Anthropic's Claude Opus 4.5 model has been absolutely insane, which is just... I me..." ▶ 4:08

Z.ai GLM models "The likes of Z.ai with their GLM models, Minimax's models, Kimi Moonshot, especially in the last ..." ▶ 5:56

Minimax "The likes of Z.ai with their GLM models, Minimax's models, Kimi Moonshot, especially in the last ..." ▶ 5:56

Kimi Moonshot "The likes of Z.ai with their GLM models, Minimax's models, Kimi Moonshot, especially in the last ..." ▶ 5:56

ChatGPT memory feature

ChatGPT memory feature "ChatGPT has a memory feature, right? And so you may have a subscription and you use it for person..." ▶ 10:02

GPT-5

GPT-5 "Personally, I have very mixed reviews of GPT-5, but it must have saved them so much money with th..." ▶ 11:26

Deep Research "Like Deep Research, Sora, o1 thinking models—all these definitional things have come from OpenAI." ▶ 13:13

Sora

Sora "Like Deep Research, Sora, o1 thinking models—all these definitional things have come from OpenAI." ▶ 13:13

o1 thinking models

o1 thinking models "Like Deep Research, Sora, o1 thinking models—all these definitional things have come from OpenAI." ▶ 13:13

Google TPUs

Google TPUs "Largely because the margin on NVIDIA chips is insane, and Google can develop everything from top ..." ▶ 12:45

Nvidia

Nvidia "Largely because the margin on NVIDIA chips is insane, and Google can develop everything from top ..." ▶ 12:45

Hugging Face

Hugging Face "On my blog, we scrape Hugging Face so we keep download numbers for every dataset and model over t..." ▶ 28:02

Mistral AI

Mistral AI "Let's throw in Mistral AI, Gemma..." ▶ 29:01

Gamma

Gamma "Let's throw in Mistral AI, Gemma..." ▶ 29:01

gpt-oss-120b "gpt-oss, the open weight model by OpenAI... gpt-oss-120b is actually a very strong model and does..." ▶ 29:05

NVIDIA Nemotron 3 "Actually, NVIDIA had a really cool one, Nemotron 3." ▶ 29:05

Qwen "Qwen might be the one— Oh, yeah. Qwen was the obvious name I was gonna say." ▶ 29:12

GPT-2 "When I was writing about OpenAI's open model release, they were like, 'Don't forget about GPT-2,'..." ▶ 29:26

SmolLM "Hugging Face has SmolLM, which is very popular." ▶ 30:07

OpenRouter

OpenRouter "With OpenRouter, it's easy to look at multi-model things. You can run DeepSeek on Perplexity." ▶ 20:25

Substack

Substack "For example, if you read a Substack article, I could maybe ask an LLM to give me opinions on that..." ▶ 1:21:04

Bing Sydney "I would love to have tried Bing Sydney. Did that have more voice? Because it would so often go of..." ▶ 1:24:07

GPT-4o

GPT-4o "There was a lot of backlash last year with GPT-4o getting removed, and I've personally never used..." ▶ 1:24:35

TikTok

TikTok "We see this with TikTok. You open it... I don't use TikTok, but supposedly in five minutes the al..." ▶ 1:25:05

Anthropic

Anthropic "A lot of researchers at these companies are so well-motivated, and definitely Anthropic and OpenA..." ▶ 1:26:50

OpenAI

OpenAI "A lot of researchers at these companies are so well-motivated, and definitely Anthropic and OpenA..." ▶ 1:26:50

Spotify

Spotify "my wife the other day—she has a podcast for book discussions, a book club, and she was transferri..." ▶ 1:33:10

Claude Code

Claude Code "For me personally, since we're talking about coding, and you mentioned debugging... the source of..." ▶ 1:33:58

Constitutional AI "That's the older term for it coined in Anthropic's Constitutional AI paper." ▶ 1:41:14

MMLU "even something simpler like MMLU, which is a multiple-choice benchmark. If you just change the fo..." ▶ 1:46:50

OpenAI o1 "I think you can kind of take this in order. I think you could view it as what made o1, which is t..." ▶ 1:47:43

GRPO "If we look at the GRPO equation, this one is famous for this because essentially the reward given..." ▶ 1:48:49

Scale-RL "I think there's a seminal paper from a Meta internship. It's called something like 'The Art of Sc..." ▶ 1:57:37

Hugging Face Transformers "When you code these from scratch, you can take an existing model from the Hugging Face Transforme..." ▶ 2:00:16

SGLang "even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..." ▶ 2:01:07

vLLM

vLLM "even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds..." ▶ 2:01:11

RoPE "With OLMo 3, the challenge was RoPE for the position embeddings. They had a YaRN extension and th..." ▶ 2:02:25

YaRN "They had a YaRN extension and there was some custom scaling there, and I couldn't quite match the..." ▶ 2:02:29

Direct Preference Optimization "the famous paper, Direct Preference Optimization, which is a much simpler way of solving the prob..." ▶ 2:10:12

LoRA "For the character training thing, I think this research is built on fine-tuning about 7 billion p..." ▶ 2:13:43

Claude

Claude "But if you go from a small university with no compute and find something that Claude struggles wi..." ▶ 2:14:33

Stable Diffusion "And listeners may know diffusion models from image generation, like Stable Diffusion popularized it." ▶ 2:29:48

GANs "There was a paper on generating images. Back then, people used GANs, Generative Adversarial Netwo..." ▶ 2:29:56

BERT "It's kind of similar to the BERT models by Google. Like, when you go back to the original transfo..." ▶ 2:30:23

Gemini Diffusion "But there was an announcement by Google, a site where they said they are launching Gemini Diffusi..." ▶ 2:32:32

Gemini Nano 2 "they put it into context of their Gemini Nano 2 model" ▶ 2:32:40

Apple Foundation Models "Like what Apple tried to do with the Apple Foundation models, putting them on the phone, where th..." ▶ 2:42:25

GPT-5.2 Pro "If you think about GPT-5.2 Pro taking an hour, it's like, what if your training run has a sample ..." ▶ 1:52:18

DeepSeek-V3.2 "DeepSeek-V3.2, where they had a sparse attention mechanism where they have essentially a very eff..." ▶ 2:48:56

World Models "There was a paper by Meta, a paper called World Models. So where they basically apply the concept..." ▶ 2:52:03

CASP "There is a competition called CASP, I think, where they do protein structure prediction" ▶ 2:52:51

AlphaFold

AlphaFold "AlphaFold, when it came out, it crushed this benchmark" ▶ 2:53:15

RTX "There's some work in this area like RTX, I think it was a few years ago, where people are startin..." ▶ 2:55:45

AI2027 report "I don't know if you like the originally titled AI27 report. They focus more on code and research ..." ▶ 3:01:14

Harmonic "I think there are startups—maybe Harmonic is one—where they're going all in on language models pl..." ▶ 3:14:23

Lean

Lean "language models plus Lean for math" ▶ 3:14:27

Slack

Slack "You want to add a new tab in Slack that you want to use, and I think AI will be able to do that p..." ▶ 3:09:02

Microsoft Word

Microsoft Word "take something like Slack or Microsoft Word. I think if organizations allow it, AI could very eas..." ▶ 3:08:45

Chrome

Chrome "If you look at the browser, Chrome. If I wanted to add a feature, if I wanted to have tabs as opp..." ▶ 3:10:09

Reflection AI "We hear about Reflection AI, where they say their two billion dollar fundraise is dedicated to bu..." ▶ 3:52:52

Black Forest Labs "They're signing licensing deals with Black Forest Labs, which is an image generation company" ▶ 3:43:38

Midjourney

Midjourney "signing licensing deals with Black Forest Labs, which is an image generation company, or Midjourney" ▶ 3:43:42

Groq

Groq "We are starting to see some types of consolidation with Groq for $20 billion" ▶ 3:36:43

Scale AI

Scale AI "Scale AI for almost $30 billion and countless other deals like this" ▶ 3:36:46

Perplexity

Perplexity "I think there will be some other multi-billion dollar acquisitions, like Perplexity" ▶ 3:38:34

Vera Rubin

Vera Rubin "That's why part of what Vera Rubin is- where they have a new chip with no high-bandwidth memory, ..." ▶ 4:01:51

CUDA

CUDA "The moat of NVIDIA is probably not just the GPU. It's more like the CUDA ecosystem, and that has ..." ▶ 4:00:42

AlexNet "I think it only happened because you could purchase those GPUs." ▶ 4:06:26

Transformer "The word 'transformer' could still be known. I would guess that deep learning is definitely still..." ▶ 4:12:08

More from these creators

Rick Beato: Greatest Guitarists of All Time, History & Future of Music | Lex Fridman Podcast #492

Lex Fridman · Mar 01, 2026 · 15 recs

Khabib vs Lex: Training with Khabib | FULL EXCLUSIVE FOOTAGE

Lex Fridman · Feb 25, 2026

OpenClaw: The Viral AI Agent that Broke the Internet - Peter Steinberger | Lex Fridman Podcast #491

Lex Fridman · Feb 12, 2026 · 19 recs

Paul Rosolie: Uncontacted Tribes in the Amazon Jungle | Lex Fridman Podcast #489

Lex Fridman · Jan 13, 2026 · 7 recs

Infinity, Paradoxes, Gödel Incompleteness & the Mathematical Multiverse | Lex Fridman Podcast #488

Lex Fridman · Dec 31, 2025 · 4 recs

Deciphering Secrets of Ancient Civilizations, Noah's Ark, and Flood Myths | Lex Fridman Podcast #487

Lex Fridman · Dec 12, 2025