Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
CAD: Disaggregating Core Attention for Efficient Long-Context LLM Training (hao-ai-lab.github.io)
6 points by ginda307 5 days ago | past | discuss
CausalWan-Moe Preview: Applying Self-Forcing Distillation to Wan2.2 (hao-ai-lab.github.io)
3 points by wlsaidhi 35 days ago | past
Disaggregated Inference: 18 Months Later (hao-ai-lab.github.io)
1 point by ginda307 50 days ago | past
FastWan: Generating a 5-Second Video in 5 Seconds via Sparse Distillation (hao-ai-lab.github.io)
12 points by wlsaidhi 4 months ago | past
Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)
12 points by zhisbug 10 months ago | past | 2 comments
Reasoning Without Hesitating: Efficient Cot Through Certainty Probing (hao-ai-lab.github.io)
20 points by ginda307 10 months ago | past | 5 comments
Efficient LLM Scheduling by Learning to Rank (hao-ai-lab.github.io)
2 points by zhisbug 11 months ago | past | 1 comment
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving (hao-ai-lab.github.io)
2 points by zhisbug on June 24, 2024 | past | 1 comment
Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)
461 points by zhisbug on May 8, 2024 | past | 98 comments
Transforming LLMs into parallel decoders boosts inference speed by up to 3.5x (hao-ai-lab.github.io)
7 points by snyhlxde on May 7, 2024 | past
Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io)
5 points by zhisbug on March 18, 2024 | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: