← All creators
S
Sebastian Raschka
77 recommendations
Products they use or recommend
Showing 58 of 77 recommendations
Clear filters"Amazon is making Trainium."
"ChatGPT has a memory feature, right? And so you may have a subscription and you use it for personal stuff, but I don't know if you want to use that same thing at work."
"Let's throw in Mistral AI, Gemma..."
"Let's throw in Mistral AI, Gemma..."
"gpt-oss, the open weight model by OpenAI... gpt-oss-120b is actually a very strong model and does some things that other models don't do very well."
"Actually, NVIDIA had a really cool one, Nemotron 3."
"For example, if you read a Substack article, I could maybe ask an LLM to give me opinions on that, but I wouldn't even know what to ask."
"I would love to have tried Bing Sydney. Did that have more voice? Because it would so often go off the rails, which is historically obviously a scary way—like telling a reporter to leave his wife—is a crazy model to potentially put in general adoption."
"There was a lot of backlash last year with GPT-4o getting removed, and I've personally never used the model, but I've talked to people at OpenAI where they get emails from users that might be detecting subtle differences in the deployments in the middle of the night."
"We see this with TikTok. You open it... I don't use TikTok, but supposedly in five minutes the algorithm gets you. It's locked in."
"A lot of researchers at these companies are so well-motivated, and definitely Anthropic and OpenAI culturally want to do good for the world."
"A lot of researchers at these companies are so well-motivated, and definitely Anthropic and OpenAI culturally want to do good for the world."
"my wife the other day—she has a podcast for book discussions, a book club, and she was transferring the show notes from Spotify to YouTube, and then the links somehow broke."
"my wife the other day—she has a podcast for book discussions, a book club, and she was transferring the show notes from Spotify to YouTube, and then the links somehow broke."
"even something simpler like MMLU, which is a multiple-choice benchmark. If you just change the format slightly, like, I don't know, if you use a dot instead of a parenthesis or something like that, the model accuracy will vastly differ."
"When you code these from scratch, you can take an existing model from the Hugging Face Transformers library. The library is great, but if you want to learn about LLMs, it's not the best place to start because the code is so complex to fit so many use cases."
"even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds another layer of complexity."
"even Transformers, the library, is not used in production. People use SGLang or vLLM, and it adds another layer of complexity."
"And then you start, let's say, with your GPT-2 model and add these things."
"With OLMo 3, the challenge was RoPE for the position embeddings. They had a YaRN extension and there was some custom scaling there, and I couldn't quite match these things."
"They had a YaRN extension and there was some custom scaling there, and I couldn't quite match these things."
"For the character training thing, I think this research is built on fine-tuning about 7 billion parameter models with LoRA, which is essentially only fine-tuning a small subset of the weights of the model."
"And listeners may know diffusion models from image generation, like Stable Diffusion popularized it."
"There was a paper on generating images. Back then, people used GANs, Generative Adversarial Networks."
"It's kind of similar to the BERT models by Google. Like, when you go back to the original transformer, they were the encoder and the decoder."
"But there was an announcement by Google, a site where they said they are launching Gemini Diffusion, and they put it into context of their Gemini Nano 2 model, and they said basically: for the same quality on most benchmarks, we can generate things much faster."
"they put it into context of their Gemini Nano 2 model"
"Like what Apple tried to do with the Apple Foundation models, putting them on the phone, where they learn from experience."
"One thing people still use is LoRA adapters. These are basically, instead of updating the whole weight matrix, there are two smaller weight matrices"
"With Nemotron 3, they found a good ratio of how many attention layers do you need for the global information compared to having these compressed states"
"DeepSeek-V3.2, where they had a sparse attention mechanism where they have essentially a very efficient, small, lightweight indexer"
"There was a paper by Meta, a paper called World Models. So where they basically apply the concept of world models to LLMs again"
"There is a competition called CASP, I think, where they do protein structure prediction"
"AlphaFold, when it came out, it crushed this benchmark"
"There's some work in this area like RTX, I think it was a few years ago, where people are starting to do that"
"I think when I was at Hugging Face, I was trying to get this to happen, but it was too early. It's like these open robotic models on Hugging Face"
"I don't know if you like the originally titled AI27 report. They focus more on code and research taste, so the target there is the superhuman coder"
"I think there are startups—maybe Harmonic is one—where they're going all in on language models plus Lean for math"
"language models plus Lean for math"
"We talked about Memory, which saves across chats. Its first implementation is kind of odd, where it'll mention my dog's name or something in a chat"
"You want to add a new tab in Slack that you want to use, and I think AI will be able to do that pretty well"
"take something like Slack or Microsoft Word. I think if organizations allow it, AI could very easily implement features end-to-end"
"We hear about Reflection AI, where they say their two billion dollar fundraise is dedicated to building US open models"
"They're signing licensing deals with Black Forest Labs, which is an image generation company"
"signing licensing deals with Black Forest Labs, which is an image generation company, or Midjourney"
"We are starting to see some types of consolidation with Groq for $20 billion"
"Scale AI for almost $30 billion and countless other deals like this"
"I think there will be some other multi-billion dollar acquisitions, like Perplexity"
"That's why part of what Vera Rubin is- where they have a new chip with no high-bandwidth memory, which is one of the most expensive pieces"
"Like, Google obviously can make TPUs"
"Amazon is making Trainium"
"The moat of NVIDIA is probably not just the GPU. It's more like the CUDA ecosystem, and that has evolved over two decades"
"We should say that the AI27 report kinda predicts one of the things it does from a narrative perspective is that there will be a lot of centralization."
"That's supposed to be the point of the Groq acquisition."
"Like, Google obviously can make TPUs."
"But I do still think that eventually, something like ChatGPT would have happened and a build-out like this would have happened, but it probably would not have been as fast."
"I think it only happened because you could purchase those GPUs."
"The word 'transformer' could still be known. I would guess that deep learning is definitely still known, but the transformer might be evolved away from in 100 years with AGI researchers everywhere."