I see comments like "is this a request to bypass sanctions" OR "he's iranian"
Let's remind ourselves of the following:
- First understand that he didn't choose to be born and raised in Iran.
- Second people grow up have families become attached to where they're born it's not easy to just 'pick up and leave' moving to a new country is expensive and extremely difficult especially from countries like Iran.
- Third he's building something he believes in which is probably better than most people who live in privileged countries who sit around and do nothing.
To me this reads like a plea for help.
He's built something and showing it to the world, if someone likes it and wants to fund him / get him out of Iran, so he can pursue his dreams AND have the people who help him benefit along with him. I'm sure he'll be all for that.
Letting that empathy lead you into violating international sanctions to funnel money to a sanctioned country is decidedly not a good idea.
Requesting that other people sign up for Stripe accounts and send you the money in violation of sanctions is bad. The other person risks extreme legal consequences including prison sentences for deliberately and openly violating sanctions.
The comments are a warning to anyone who might feel compelled to do what is being asked, without realizing the specific request is a serious legal matter that can come with prison terms measured in decades.
> I see comments like "is this a request to bypass sanctions"
Did you read the Gist he posted? It was a direct request for someone to help him bypass sanctions.
Immigrating to a new country isn't something you do on a whim. It takes a very long time (years) and it's not really cheap. It's not a solution to the OP's problem.
The OP wasn't asking for help leaving their country. They were making a specific ask to violate sanctions.
The problem described in this post has nothing to do with LLMs. It has everything to do with work culture and bureaucracy. Rules and laws that don't make sense remain because changing it requires time, energy and effort that most people in companies have either tried and failed or don't care enough to make a change.
This is one example of the "horseless carriage" AI solutions. I've begun questioning further that actually we're going into a generation where a lot of the things we are doing now are not even necessary.
I'll give you one more example. The whole "Office" stack of ["Word", "Excel", "Powerpoint"] can also go away. But we still use it because change is hard.
Answer me this question. In the near future if we could have LLMs that can traverse to massive amount of data why do we need to make excel sheets anymore? Will we as a society continue to make excel spreadsheets because we want the insights the sheet provides or do we make excel sheets to make excel sheets.
The current generation of LLM products I find are horseless carriages. Why would you need agents to make spreadsheets when you should just be able to ask the agent to give you answers you are looking for from the spreadsheet.
LLMs are not able to replace Excel in their current state. See this simple accounting test: https://accounting.penrose.com/ - errors compound over time. This is the case even with small datasets. For massive corporate datasets it's useless (try asking Gemini to summarise a Google Sheet).
Until there is a fix for this (not clear there ever will be), Excel will be necessary.
Word will probably become a different, more collaborative product. Notion-esque.
Powerpoint...I would love if it disappeared but ultimately if you have to present something, you need to have done the work.
> Answer me this question. In the near future if we could have LLMs that can traverse to massive amount of data why do we need to make excel sheets anymore?
A couple of related questions- if airplanes can fly themselves with auto-pilot, why do we need steering yolks? If I have a dishwasher- why do I still keep sponges and dish soap next to my sink?
The technology is nowhere near being reliable enough that we can eschew traditional means of interacting with data. That doesn't prevent the technology from being massively useful.
> Why would you need agents to make spreadsheets when you should just be able to ask the agent to give you answers you are looking for from the spreadsheet.
Because it seems to be a fundamental property of LLMs that they just make things up all the time. It's better to make the LLM a natural interface to a formal query language which will return hard answers with fidelity from the database.
Think of them as an artifact of a snapshot of time. You can sign them and file them away, perform backups on them, and use that document the intent at that time.
Based on my testing the larger the model the better it is at handling larger context.
I tested with 8B model, 14B model and 32B model.
I wanted it to create structured json, and the context was quite large like 60k tokens.
the 8B model failed miserably despite supporting 128k context, the 14b did better the 32B one almost got everything correct. However when jumping to a really large model like grok-3-mini it got it all perfect.
The 8B, 14B, 32B models I tried were Qwen 3. All the models I tested I disabled thinking.
Now for my agent workflows I use small models for most workflow (it works quite nicely) and only use larger models when the problem is harder.
That is true too. But I found Qwen3 14B with 8bit quant fair better than 32B with 4b quant . Both kvcache at 8bit. ( i enabled thinking , i will try with /nothink)
LLMs are relatively new technology. I think it's important to recognize the tool for what it is and how it works for you. Everyone is going to get different usage from these tools.
What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.
There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.
However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.
It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.
If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.
If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).
I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.
I've been working on solving this with quite a bit of success, I'll be sharing more on this soon. It involves having 2 systems 1st system is the LLM itself and another system which acts like a 'curator' of thoughts you could say.
It dynamically swaps in / out portions of the context. This system is also not based on explicit definitions it relies on LLMs 'filling the gaps'. The system helps the llm break down problems into small tasks which then eventually aggregate into the full task.
May I suggest - put what you have out there in the world, even if it’s barely more than a couple of prompts. If people see it and improve on it, and it’s a good idea, it’ll get picked up & worked on by others - might even take on a life of its own!
You can see it's going from introduction, asking me for my name, and then able to answer question about some topic. There is also another example in the thread you can see.
Behind the scenes, the system prompt is being modified dynamically based on the user's request.
All the information about movies is also being loaded into context dynamically. I'm also working on some technique to unload stuff from context when the subject matter of a given thread has changed dramatically. Imagine having a long thread of conversation with your friend, and along the way you 'context switch' multiple times as time progresses, you probably don't even remember what you said to your friend 4 years ago.
There is a concept of 'main thread' and 'sub threads' involved as well that I'm exploring.
I will be releasing the code base in the coming months. I need to take this demo further than just a few prompt replies.
- infinity (great for embedding / reranking models not for LLMs)
My personal feeling is SGLang / vLLM have issues that make me not want to use it. Sure it's fast, but there are reliability issues, you need lots of flags and tinkering to make it work. Also there is the problem of 100% cpu usage on idle which the core contributors say is 'normal' and 'expected'. You can do a search in the respective repositories on this topic if you don't believe me. People even submitted PRs to solve these issues which they have not merged. The mindset of these software is just to get it to 'work' but not really on polish and ease of use.
TGI on the other hand is in a class of it's own. You can just feel the polish that went into it. Things tend to 'just work'. It's the only engine I tried that was able to run a model I wanted in a single try. Then I added the flags to make it fit with my hardware (like sharding and max prefill tokens). TGI uses flashinfer by default which is SOTA when it comes to flash attention backend.
llama.cpp has widest model support, however it does not perform as well as TGI / vLLM / SGLang. So if you can accept that you are losing performance (based on my testing about 30% slower) tt's great for testing, development purposes but if you want to do production grade stuff I would recommend TGI.
This is fantastic work! They look beautiful. I hope you make lots of money from them because I know it's a lot of work. I'll be a customer in the near future for sure.
https://zacksiri.dev - My Blog
reply