I have used llama.cpp to run BankLLaVA model on my m1 chip and describe what does the model see!
It's pretty easy. Let me tell you!
1. install llama.cpp
2. download models from hugging face (gguf)
3. run the script to start a server of the model
4. execute script with camera capture!
The tweet got 90k views in 10 hours. And was liked by the Georgi Gerganov (llama.cpp author) and Andrew Karpathy. Got super happy, ahah, my legends liked my tweet:)
I have released step by step guide for it on github.
Let me know what you think?
What potential usage does this open?
https://github.com/Fuzzy-Search/realtime-bakllav
Related links
Discussion on running llama: https://github.com/ggerganov/llama.cpp/pull/3436
Model: https://huggingface.co/SkunkworksAI/BakLLaVA-1
X: https://twitter.com/Karmedge/status/1720825128177578434