Submissions from hao-ai-lab.github.io

		CAD: Disaggregating Core Attention for Efficient Long-Context LLM Training (hao-ai-lab.github.io)
		6 points by ginda307 5 days ago \| past \| discuss
		CausalWan-Moe Preview: Applying Self-Forcing Distillation to Wan2.2 (hao-ai-lab.github.io)
		3 points by wlsaidhi 35 days ago \| past
		Disaggregated Inference: 18 Months Later (hao-ai-lab.github.io)
		1 point by ginda307 50 days ago \| past
		FastWan: Generating a 5-Second Video in 5 Seconds via Sparse Distillation (hao-ai-lab.github.io)
		12 points by wlsaidhi 4 months ago \| past
		Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)
		12 points by zhisbug 10 months ago \| past \| 2 comments
		Reasoning Without Hesitating: Efficient Cot Through Certainty Probing (hao-ai-lab.github.io)
		20 points by ginda307 10 months ago \| past \| 5 comments
		Efficient LLM Scheduling by Learning to Rank (hao-ai-lab.github.io)
		2 points by zhisbug 11 months ago \| past \| 1 comment
		MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving (hao-ai-lab.github.io)
		2 points by zhisbug on June 24, 2024 \| past \| 1 comment
		Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)
		461 points by zhisbug on May 8, 2024 \| past \| 98 comments
		Transforming LLMs into parallel decoders boosts inference speed by up to 3.5x (hao-ai-lab.github.io)
		7 points by snyhlxde on May 7, 2024 \| past
		Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io)
		5 points by zhisbug on March 18, 2024 \| past \| 1 comment