Hacker Newsnew | past | comments | ask | show | jobs | submit | ModelForge's submissionslogin
1.A Researcher's Field Guide to Non-Standard LLM Architectures (sebastianraschka.com)
2 points by ModelForge 4 days ago | past | discuss
2.Explanation of Gated DeltaNet (Qwen3-Next and Kimi Linear) (github.com/rasbt)
3 points by ModelForge 5 days ago | past | discuss
3.The Core Components of Modern LLMs and the Models Beyond Transformers [video] (youtube.com)
3 points by ModelForge 12 days ago | past | discuss
4.Popular Attention Alternatives: GQA, MLA, SWA (sebastianraschka.com)
4 points by ModelForge 24 days ago | past
5.Multi-Head Latent Attention (sebastianraschka.com)
4 points by ModelForge 26 days ago | past
6.Thinking Machines Lab Co-Founder Departs for Meta (wsj.com)
7 points by ModelForge 28 days ago | past
7.OpenAI's internal Slack messages could cost it billions in copyright suit (sherwood.news)
8 points by ModelForge 29 days ago | past | 1 comment
8.LLM Evaluation from Scratch: Multiple Choice, Verifiers, Leaderboards, LLM Judge (sebastianraschka.com)
4 points by ModelForge 34 days ago | past
9.Gemma 3 270M re-implemented in pure PyTorch for local tinkering (github.com/rasbt)
417 points by ModelForge 80 days ago | past | 57 comments
10.GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (sebastianraschka.com)
490 points by ModelForge 3 months ago | past | 97 comments
11.LLM Research Papers: The 2024 List (sebastianraschka.com)
5 points by ModelForge 10 months ago | past
12.Scaling Test-Time Compute with Open LLM Models (huggingface.co)
3 points by ModelForge 10 months ago | past

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: