More

riverdweller · 2025-11-17T18:37:46 1763404666

You keep saying that major models have "tool calling built in". And that by giving them context about available APIs, the LLM can "use the API".

But you don't explain, in any of your comments, precisely how an LLM in practice is able to itself invoke an API function. Could you explain how?

A model is typically distributed as a set of parameters, interpreted by an inference framework (such as llama.cpp), and not as a standalone application that understands how to invoke external functions.

So I am very keen to understand how these "major models" would invoke a function in the absence of a chassis container application (like Claude Code, that tells the model, via a prompt prefix, what tokens the model should emit to trigger a function, and which on detection of those tokens invokes the function on the model's behalf - which is not at all the same thing as the model invoking the function itself).

Just a high level explanation of how you are saying it works would be most illuminating.

cstrahan · 2025-11-17T20:43:35 1763412215

The LLM output differentiates between text output intended for the user to see, vs tool usage.

You might be thinking "but I've never seen any sort of metadata in textual output from LLMs, so how does the client/agent know?"

To which I will ask: when you loaded this page in your browser, did you see any HTML tags, CSS, etc? No. But that's only because your browser read the HTML rendered the page, hiding the markup from you.

Similarly, what the LLM generates looks quite different compared to what you'll see in typical, interactive usage.

See for example: https://platform.openai.com/docs/guides/function-calling

The LLM might generate something like this for text:

    {
      "content": [
        {
          "type": "text",
          "text": "Hello there!"
        }
      ],
      "role": "assistant",
      "stop_reason": "end_turn"
    }

Or this for a tool call:

    {
      "content": [
        {
          "type": "tool_use",
          "id": "toolu_abc123",
          "name": "get_current_weather",
          "input": {
            "location": "Boston, MA"
          }
        }
      ],
      "role": "assistant",
      "stop_reason": "tool_use"
    }

The schema is enforced much like end-user visible structured outputs work -- if you're not familiar, many services will let you constrain the output to validate against a given schema. See for example:

https://simonwillison.net/2025/Feb/28/llm-schemas/

https://platform.openai.com/docs/guides/structured-outputs

riverdweller · 2025-06-01T17:15:01 1748798101

Rooting your LG TV allows you to install homebrew apps. If you have a device that's at least a couple of years old and has not had its firmware updated, try https://rootmy.tv/

riverdweller · on Dec 11, 2024

Microsoft also last year released their own Redis-client-compatible key-value store, Garnet: https://github.com/microsoft/garnet

riverdweller · on June 7, 2024

Human recall failure. Probably wanted "seemingly", "apparently", or even "ostensibly", but who's got time for all that when the publish button's right there.

riverdweller · on May 6, 2024

I remember installing the TCP/IP stack from a treasured special floppy onto win 3.11 machines (giving them RIPE addresses and using NT4 as a router to expose the entire network onto the public internet - not a great idea in retrospect, but it was 1994 and everyone seemed so polite).

riverdweller · on April 27, 2024

Meta has just released another in a series of extraordinary models that are already being abused to flood the crawlable web with regurgitated generative uncanny-valley material.

Since Google abandoned their focus on service quality, they are a prime target. The output of these models further hastens Google's decline in utility, and in turn, attractiveness to advertisers. Those ad budgets have to be spent somewhere, though; guess where?

riverdweller · on April 2, 2024

Do you mean Danish audio description in Danish and English subtitles?

PartiallyNerdy · on April 2, 2024

All in Danish. It was a form of multi-modal learning. I was pairing the text to the sound and the visual input. I frequently needed to pause and translate the subtitles — often word by word — but I think the combination of the three helped quite a bit as it helped pair the sound with the image without translating to an intermediary language that I understood.

riverdweller · on March 9, 2024

It’s not elegant, but the term is evolutionary mismatch.

riverdweller · on Jan 29, 2024

Perhaps think instead of human skin as a wretched slimy substrate on which poor, determined fungi have to somehow eke an existence.

riverdweller · on Dec 13, 2023

This approach was what we had to use back in the late 90s. We called it "iframe streaming", or "forever iframe" (and years later, as an industry term emerged, "comet"). It worked surprisingly well, except in cases where a client sat behind a greedily buffering proxy. We would send JavaScript statements that invoked callbacks on the client, rather than just JSON, as this avoided the need to parse data to determine which business logic to use on the client. This has the limitation of being "non-cross domain" (i.e. the web page containing the callback functions have to be served from the same domain as the infinite document).

To get around buffering proxies, we optionally allowed our clients to use a JSONP long polling approach instead, whereby the client would dynamically generate a form inside a hidden iframe, and POST a request for JSONP data (JSON delivered wrapped in a callback), and the server would return data as soon as any was available. The client would immediately repeat the process to request more data, an infinitum.

Eventually, the emergence of the XMLHTTPRequest object in IE (and subsequently in other browsers) allowed us to implement cleaner long-poll-style methods, holding the connection open until data was available (and automatically reconnecting on error). This was later enhanced with CORS for delivery of data from arbitrary domains. As support for detecting updates to an in-progress response became available (via XMLHTTPRequest's "progress" event, which for a long time was horribly buggy in IE ) our payloads became infinite streams too (of JavaScript callback invocations). Early versions of approach also required us to reload the entire page from time to time, as IE's underlying implementations of these browser objects appeared to have memory leaks (that we did not see in Firefox, for example).

When IE8 was released, we allowed clients to optionally use its XDomainRequest object to stream a response instead.

Years later, the much cleaner Server Sent Events (SSE) and WebSocket options became possible. Intermediate proxy support was initially troublesome however, and while both of these were our preferred choices from a performance and API perspective, it took several years before we could consider removing support for our earlier approaches. Even today, there are network environments where an infinite sequence of long polls is the only reliable option...

My preference today? The JavaScript fetch API for sending commands, with a simple ack as a response, and an async flow of events over a persistent SSE connection, that feed into a simple JS message bus (implemented using the browser's native event API) for delivery to vanilla JavaScript web components. Simple, clean, and consistent.