Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
CAD: Disaggregating Core Attention for Efficient Long-Context LLM Training
(
hao-ai-lab.github.io
)
6 points
by
ginda307
5 days ago
|
past
|
discuss
CausalWan-Moe Preview: Applying Self-Forcing Distillation to Wan2.2
(
hao-ai-lab.github.io
)
3 points
by
wlsaidhi
35 days ago
|
past
Disaggregated Inference: 18 Months Later
(
hao-ai-lab.github.io
)
1 point
by
ginda307
50 days ago
|
past
FastWan: Generating a 5-Second Video in 5 Seconds via Sparse Distillation
(
hao-ai-lab.github.io
)
12 points
by
wlsaidhi
4 months ago
|
past
Fast Video Generation with Sliding Tile Attention
(
hao-ai-lab.github.io
)
12 points
by
zhisbug
10 months ago
|
past
|
2 comments
Reasoning Without Hesitating: Efficient Cot Through Certainty Probing
(
hao-ai-lab.github.io
)
20 points
by
ginda307
10 months ago
|
past
|
5 comments
Efficient LLM Scheduling by Learning to Rank
(
hao-ai-lab.github.io
)
2 points
by
zhisbug
11 months ago
|
past
|
1 comment
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
(
hao-ai-lab.github.io
)
2 points
by
zhisbug
on June 24, 2024
|
past
|
1 comment
Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x
(
hao-ai-lab.github.io
)
461 points
by
zhisbug
on May 8, 2024
|
past
|
98 comments
Transforming LLMs into parallel decoders boosts inference speed by up to 3.5x
(
hao-ai-lab.github.io
)
7 points
by
snyhlxde
on May 7, 2024
|
past
Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation
(
hao-ai-lab.github.io
)
5 points
by
zhisbug
on March 18, 2024
|
past
|
1 comment
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: