Hacker Newsnew | past | comments | ask | show | jobs | submit | crowwork's commentslogin

The goal of the project is to bring open ABI and FFI for machine learning systems.

- Stable, minimal C ABI designed for kernels, DSLs, and runtime extensibility. - Zero-copy interop across PyTorch, JAX, and CuPy using DLPack protocol. - Compact value and call convention covering common data types for ultra low-overhead ML applications. - Multi-language support out of the box: Python, C++, and Rust (with a path towards more languages).


Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python


XGrammar is an open-source library for efficient, flexible, and portable structured generation. Bring 2x-10x speedup in grammar grammar-guided(JSON and CFG) LLM serving.


Comes with ability to do full structured generation with json schema

also a in-browser demo https://chat.webllm.ai/


runs on qwen2 on iphone with 26 tok/sec and a OpenAI style swift API


2b model running at 20tok/sec on iphone, nice potential for future applications


Runs Phi-2 on Samsung S23 with pretty decent speed on Google Chrome browser.

LLM on browser on a phone


You can also try out the vulkan backend, which we know should work for windows, although speed might be slower than rocm


Yes, it works out of box and the blog contains a prebuilt python package that you can try out


There is also vulkan support which should be more universal(also included in the post), for example, the post also shows running LLM on a steamdeck APU.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: