More

firasd · 2026-01-09T10:57:49 1767956269

Yeah I mean it would be better if REST was the way tools were exposed to LLMs

I'm just glad it's there as a standardized approach. Right now I can connect an MCP Clock to ChatGPT .com/iOS, Claude .ai/iOS/Claude Code/Claude .exe

It's wild that it's been over three years and these apps don't have a way to check the time without booting up a REPL, relying on an outdated system prompt comment about current UTC or doing web search for cached pages. Bill Gates would have added this to ChatGPT by Dec 2022

You can add my clock [https://mcpclock.firasd.workers.dev/sse] to any of your AI apps right now

(the code is directly deployed from this github to a cloudflare worker if you want to check what it does https://github.com/firasd/mcpclock/blob/main/src/index.ts)

firasd · 2026-01-07T14:34:03 1767796443

Documenting this odd behavior where Claude can't seem to output smart quotes at all. As Sonnet notes, the justification is somewhat hard to understand...

firasd · 2026-01-04T16:09:54 1767542994

But this is the thing I'm pointing out. The idea that the LLM is an oracle or at least a stable subjective view holder is a mistake.

As humans, WE have to explore the latent space of the model. We have to activate neurons. We have to say maybe the puritanism of the left ... maybe the puritanism of the right.. okay how about...

We are privileged--and doomed--to have to think for ourselves alas

firasd · 2026-01-04T14:56:04 1767538564

This is my attempt to articulate why some recent shifts in AI discourse seem to be degrading the product experience of everyday conversation.

I argue that “sycophancy” has become an overloaded and not very helpful term; almost a fashionable label applied to a wide range of unrelated complaints (tone, feedback depth, conversational flow).

Curious whether this resonates with how you feel or if you disagree

Also see the broader Vibesbench project: https://github.com/firasd/vibesbench/

Vibesbench discord: https://discord.gg/5K4EqWpp

_alternator_ · 2026-01-04T16:01:37 1767542497

The issue raised here seems mostly semantic, in the sense that the concern is about the mismatch between the standard meaning of a word (sycophant) and its meaning as applied to an issue with LLMs.

It seems to me that the issue it refers to (unwarranted or obsequious praise) is a real problem with modern chatbots. The harms range from minor (annoyance, or running down the wrong path because I didn’t have a good idea to start with) to dangerous (reinforcing paranoia and psychotic thoughts). Do you agree that these are problems, and there a more useful term or categorization for these issues?

firasd · 2026-01-04T17:09:58 1767546598

Re: minor outcomes. It really depends on the example I guess. But if the user types "What if Starbucks focuses on lemonade" and then gets disappointed that the AI didn't yell at them for being off track--what are they expecting exactly? The attempt to satisfy them has led to GPT-5.2-Thinking style nitpicking[1] They have to think of the stress test angles themselves ('can we look up how much they are selling as far as non-warm beverages...')

[1] eg. when I said Ian Malcolm in Jurassic Park is a self-insert, it clarified to me "Malcolm is less a “self-insert” in the fanfic sense (author imagining himself in the story) and more Crichton’s designated mouthpiece". Completely irrelevant to my point but answering as if a bunch of reviewers are gonna quibble with its output

With regards to mental health issues, of course nobody on Earth (not even the patients with these issues, in their moments of grounded reflection) would say that that the AI should agree with their take. But I also think we need to be careful about what's called "ecological validity". Unfortunately I suspect there may be a lot of LARPing in prompts testing for delusions akin to Hollywood pattern matching, aesthetic talk etc.

I think if someone says that people are coming after them the model should not help them build a grand scenario, we can all agree with that. Sycophancy is not exactly the concern there is it? It's more like knowing that this may be a false theory. So it ties into reasoning and contextual fluency (which anti-'sycophancy' tuning may reduce!) and mental health guardrails

A4ET8a8uTh0_v2 · 2026-01-04T16:37:53 1767544673

<< The harms range from minor (annoyance, or running down the wrong path because I didn’t have a good idea to start with) to dangerous (reinforcing paranoia and psychotic thoughts). Do you agree that these are problems, and there a more useful term or categorization for these issues?

I think that the issue is a little more nuanced. The problems you mentioned are problems of sort, but the 'solution' in place kneecaps one of the ways llms ( as offered by various companies ) were useful. You mention the problem is reinforcement of the bad tendencies, but no indication of reinforcement of good ones. In short, I posit that the harms should not outweigh the benefits of augmentation.

Because this is the way it actually does appear to work:

1. dumb people get dumber 2. smart people get smarter 3. psychopaths get more psychopathy

I think there is a way forward here that does not have to include neutering seemingly useful tech.

____mr____ · 2026-01-05T09:37:37 1767605857

AI sycophancy is a real issue and having an AI affirm the user in all/most cases has already led to a murder-suicide[0]. If we want AI chatbots to be "reasonable" conversation participants or even something you can bounce ideas off of, they need to not tell you everything you suggest is a good idea and affirm your every insecurity or neurosis.

0. https://www.aljazeera.com/economy/2025/12/11/openai-sued-for...

rvnx · 2026-01-04T15:36:37 1767540997

It's because people are so happy that they learnt a new word that they try to say it to everyone in every occasion.

What drives me crazy are the emojis and the patronizing at the end of conversation.

Before 2022 no-one was using that word

bananaflag · 2026-01-04T18:38:29 1767551909

Same with "stochastic parrot", I doubt lots of people knew the word "stochastic" before

florkbork · 2026-01-05T09:53:34 1767606814

Did you actually argue this?

Or did you place about 2-5 paragraphs per heading, with little connection between the ideas?

For example:

> Perhaps what some users are trying to express with concerns about ‘sycophancy’ is that when they paste information, they'd like to see the AI examine various implications rather than provide an affirming summary.

Did you, you personally, find any evidence of this? Or evidence to the opposite? Or is this just a wild guess?

Wait; nevermind that we're already moving on! No need to do anything supportive or similar to bolster.

> If so, anti-‘sycophancy’ tuning is ironically a counterproductive response and may result in more terse or less fluent responses. Exploring a topic is an inherently dialogic endeavor.

Is it? Evidence? Counter evidence? Or is this simply feelpinion so no one can tell you your feelings are wrong? Or wait; that's "vibes" now!

I put it to you that you are stringing together (to an outside observer using AI) a series of words in a consecutive order that feels roughly good but lacks any kind of fundamental/logical basis. I put it to you that if your premise is that AI leads to a robust discussion with a back and forth; the one you had that resulted in "product" was severely lacking in any real challenge to your prompts, suggestions, input or viewpoints. I invite you to show me one shred of dialogue where the AI called you out for lacking substance, credibility, authority, research, due dilligence or similar. I strongly suspect you can't.

Given that; do you perhaps consider that might be the problem when people label AI responses as sycophancy?

firasd · 2026-01-05T12:41:34 1767616894

Well I do have a chat log somewhere where I say potential energy seems like a fake concept and GPT and/or Gemini got around to explaining that it can actually be expressed in equations reliably.. does that count?

"called you out for lacking substance, credibility, authority, research, due dilligence or similar" seems like level of emotional angst that LLMs don't usually tend to show

Actually amusingly enough the Gemini/Verhoeven example in my doc is an example where the AIs seem to have a memorably strong opinion

firasd · 2025-12-27T16:55:07 1766854507

These Stockfish multipv (multi-principal-variation) outputs show something interesting that contradicts the standard chess narrative of two of the most famous games.

For more info on 'sham sacrifices', check out https://en.wikipedia.org/wiki/Sacrifice_(chess)#Sham_sacrifi... and https://en.wikipedia.org/wiki/Rudolf_Spielmann

firasd · 2025-10-19T16:45:51 1760892351

I’ve found this quick command using standard Linux utilities helpful:

find . -type f ! -name 'combined_output.txt' ! -name '.png' ! -name '.css' -print0 | xargs -0 -I {} sh -c 'echo "Full path: {}"; echo "-------------------"; cat "{}"; printf "\n-------------------\n"' > combined_output.txt

firasd · 2025-10-07T17:19:14 1759857554

Hi folks--I put this together because I'm always frustrated by the 'digital entropy' of quotes online. It's surprisingly hard to find a 100% faithful transcription of a specific movie scene or a passage from a book. This is my attempt to create a small, high-fidelity record of a memorable scene, putting the film and the book side-by-side.

Something interesting about this film is the meta-context. In 2006, the casting of Robert Downey Jr. and Winona Ryder, both of whose careers had been derailed by personal struggles, lent the film a layer of reality. (The film's tagline was also "everything is not going to be ok"!)

And I had actually watched during the theatrical window back then--when PKD's author's note appears at the end, mourning his lost friends, I found it emotional.

firasd · 2025-10-05T04:55:37 1759640137

Thanks. Basically I find it useful to talk to ChatGPT or Claude after pasting this in.. by now, in comparison, Google Maps feels cumbersome to use for the Delhi metro because of all the tappity-tap-tapping required to pull up routes etc

Also for humans we kinda want to get a sense of 'lines' like how many stations remain between now and my destination, or what are the details of the interchange--the interchange is basically the 'plot' of the trip but for routing apps it's like a minor detail in the GUI.

firasd · 2025-09-09T13:51:28 1757425888

Interesting!

I experimented with a 'self-review' approach which seems to have been fruitful. E.g.: I said Lelu from The Fifth Element has long hair. GPT 4o in chat mode agreed. The GPT 4o in self-review mode disagreed (reviewer was right). The reviewer basically looks over the convo and appends a note

Link: https://x.com/firasd/status/1933967537798087102

firasd · 2025-09-05T16:03:49 1757088229

I wrote this to argue that the GPT-5 rollout backlash wasn't just users hating change, but a rational response to a functional product downgrade.

My thesis is that "personality is utility." The collaborative "vibe" of 4o wasn't a bug; it was a feature that produced better results. OpenAI replaced a transparent creative partner with an opaque, cost-optimized answer engine and was surprised when users revolted.

I've tried to ground this by connecting the emotional Reddit posts to the concrete frustrations of experts who were all complaining about the same loss of agency and transparency, just in different terms. Curious to hear your thoughts.