I have been playing with OpenAI, Anthropic, and Groq’s APIs in my spare time and...

I have been playing with OpenAI, Anthropic, and Groq’s APIs in my spare time and if someone reading this doesn’t know it, they are doing the same thing and they are so close in implementation that it’s just dumb that they are in any way different.

You pass listing of messages generated by the user or the LLM or the developer to the API, it generates a part of the next message. That part may contain thinking blocks or tool calls (local function calling requested by the LLM). If so, you execute the tool calls and re-send the request. After the LLM has gathered all the info it returns the full message and says I am done. Sometimes the messages may contain content blocks that are not text but things like images, audio, etc.

That’s the API. That’s it. Now there are two improvements that are currently in the works:

1. Automatic local tool calling. This is seriously some sort of afterthought and not how they did it originally but ok, I guess this isn’t obvious to everyone.

2. Not having to send the entire message history back. OpenAI released a new feature where they store the history and you just send the ID of your last message. I can’t find how long they keep the message history. But they still fully support you managing the message history.

So we have an interface that does relatively few things, and that has basically a single sensible way to do it with some variations for flavor. And both OpenAI and Anthropic are engaged in a turf war over whose content block types are better. Just do the right thing and make your stuff compatible already.