I'm excited to share Fuji-Web, an open-source AI web agent designed to automate various web tasks.
I had this idea of using vision-LLM to build web agent project in Nov 2023 when GPT-4V was just released.
I'm so proud the interesting idea has evolved into a state-of-the-art Web Agent. (You can find benchmarks in the blog post.)
We started this research because we wanted to find out how far away we are from having an LLM-based assistant that's capable of navigating the complex real world. It turns out if you are able to narrow down the problem and give clear instructions, we are almost there!
This is impressive, well done. I didn't find it on the roadmap, but do you plan to add support for local LLM models like llama3? I think it should be possible to launch a local server from LM studio or similar and try to run it even now, but native support would be much better. Or maybe you tried it, but it's not good enough?
I had this idea of using vision-LLM to build web agent project in Nov 2023 when GPT-4V was just released. I'm so proud the interesting idea has evolved into a state-of-the-art Web Agent. (You can find benchmarks in the blog post.)
We started this research because we wanted to find out how far away we are from having an LLM-based assistant that's capable of navigating the complex real world. It turns out if you are able to narrow down the problem and give clear instructions, we are almost there!
Repo: https://github.com/normal-computing/fuji-web Our blog post: https://blog.normalcomputing.ai/posts/2024-05-22-introducing...