As someone who helps run another volunteer tech nonprofit that relies on slack, is there a reason they kicked you off the free nonprofit plan? Asking to have a good backup plan for our community.
> They can get halfway there and then struggle immensely.
Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.
It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.
So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.
Chatbot UIs really need better support for conversation branching all around. It's very handy to be able to just right-click on any random message in the conversation in LM Studio and say, "branch from here".
Maybe it's contrarian, maybe it's not, but I don't think Chat UIs are well suited for software engineering/programming at all, we need something completely different. Being able to branch conversations and such would be useful, but probably not for the way I do software. Besides, I'm rarely beyond 3 messages (1 system, 1 user, 1 assistant) in any usage of the chat UIs. Maybe it's more useful to people with different workflows.
I don't see how you'd avoid using chat if you need the bot to work on some bug end-to-end. I usually have many rounds in a chat session, first asking it to identify the overall approach, reviewing and approving that, then one or more rounds for coding, and several more to request edits as needed.
If you only ever ask it for trivial changes that don't require past context to make sense, then chat is indeed overkill. But we already have different UX approaches for that - e.g. some IDEs watch for specially formatted comments to trigger code generation, so you literally just type what you want right there in the editor, exactly where you want the code to go.
Yeah, I'd agree you want to iterate, but I'm not sure the UX of "Log of messages, where some of yours, some are tool calls, others are the assistant" and the workflow of "Add more messages into the log of messages"/"Change existing messages" is the right broad UX for this type of work.
I'm sorry I can't substantiate it more than that, as my own head is still trying to wrap itself around what I think is needed instead. Still, sounds very "fluffy" even when I read it back myself.
It does indeed. What I'm saying is that, for some mysterious reason, none of the first-party chatbot apps do that - ChatGPT, Claude, Gemini all lack this feature.
AI Studio has this, I usually ask it to plan and I do some rounds of refining until the plan covers all my requirements, then I branch this conversation, a branch for each feature, none of the branches get polluted this way.
Can you imagine if Excel worked like this? the formula put out the wrong result, so try again! It's like that scene from The Office where Michael has an accountant "run it again." It's farcical. They have created computers that are bad at math and I will never forgive them.
Also, each try costs money! You're pulling the lever on a god damned slot machine!
I will TRY AGAIN with the same prompt when I start getting a refund for my wasted money and time when the model outputs bullshit, otherwise this is all confirmation and sunk cost bias talking, I'm sure if it.
I mean, why would I imagine that? Who would want that? It's like the argument against legal marijuana, and someone replies "But would you like your pilot to be high when flying?!". Right tool for the right job, clearly when you want 100% certainty then LLMs aren't the tool for that. Just because they're useful for some things don't mean we have to replace everything with them.
> Also, each try costs money!
I guess you're using some paid API? Try a different way then. I mostly use the web UI from OpenAI, or Codex lately, or ran locally with my own agent using local weights, neither is "each try costs money" more than writing data to my SSD is costing me money.
It's not a holy grail some people paint it, and not sure we're across the "productivity threshold" (https://news.ycombinator.com/item?id=44160664) yet, but it's worth trying it out probably before jumping to conclusions. But no one is forcing you either, YMMV and all that.
I thought Claude still has a problem generating the same output for the same input? That you can't just rewind and rerun and get to the same point again.
> I thought Claude still has a problem generating the same output for the same input?
I haven't used Anthropic's models/software in a long time (months, basically forever in AI ecosystem), so don't know exactly how it works now.
But last time I used Claude, you could edit the first message, and then re-generate the assistants next message based on your edit. Most of the LLM interfaces has one or another way of doing this, I can't imagine they got rid of that feature.
What I'm suggesting isn't to use the exact same input (the first message), but rather change it so you remove the chances of something incorrect happening later after that.
Good engineering? You want automated steps to be repeatable so you know your tweak to the previous conversation have the effect you desire. Though using an AI for coding is probably closer in spirit the the art of writing code than the engineering of writing code and art is pretty much unrepeatable by definition.
Fair enough. Use the respective API or Google Gemini which will let you set temperature to zero resulting in deterministic output barring FP errors accumulating when paired with non-standard GPU/TPU configurations. Likely not to differ by much in the vast majority of cases though.
The comment in lines 163 - 172 make some claims that are outright false and/or highly A/S dependent, to the point where I question the validity of this post entirely. While it's possible that an A/S can be pseudo-generated based on lots of training data, each implementation makes very specific design choices: i.e.: Auth0's A/S allows for a notion of "leeway" within the scope of refresh token grant flows to account for network conditions, but other A/S implementations may be far more strict in this regard.
My point being: assuming you have RFCs (which leave A LOT to the imagination) and some OSS implementations to train on, each implementation usually has too many highly specific choices made to safely assume an LLM would be able to cobble something together without an amount of oversight effort approaching simply writing the damned thing yourself.
I am waiting for studies whether we have just an illusion of production or these actually save man hours in the long term in creation of production-level systems.
One way to mitigate the issue is to use tests or specifications and let the AI find a solution to the spec.
A few months ago, solving such a spec riddle could take a while, and most of the time, the solutions that were produced by long run times were worse than the quick solutions. However, recently the models have become significantly better at solving such riddles, making it fun (depending on how well your use case can be put into specs).
In my experience, sonnet 3.7 represented a significant step forward compared to sonnet 3.5 in this discipline, and Gemini 2.5 Pro was even more impressive. Sonnet 4 makes even fewer mistakes, but it is still necessary to guide the AI through sound software engineering practices (obtaining requirements, discovering technical solutions, designing architecture, writing user stories and specifications, and writing code) to achieve good results.
Edit: And there is another trick: Provide good examples to the AI. Recently, I wanted to create an app with the OpenAI Realtime API and at first it failed miserably, but then I added the most important two pages of the documentation and one of the demo projects into my workspace and just like that it worked (even though für my use-case the API calls had to be use quite differently).
That's one thing where I love Golang. I just tell Aider to `/run go doc github.com/some/package`, and it includes the full signatures in the chat history.
It's true: often enough AI struggles to use libraries, and doesn't remember the usage correctly. Simply adding the go doc fixed that often.
This to me is why I think these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data.
> these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data
I mean, bypassing the fact that "actual understanding" doesn't have any consensus about what it is, does it matter if it's "actual understanding" or "kind of understanding", or even "barely understanding", as long as it produces the results you expect?
Then you do that part yourself. You let AI automate the 20/50/80% (*) of work it can, and you now only need to do the remainder manually.
(*) which one of these it is depends on your case. If you're writing a run-of-the-mill Next.js app, AI will automate 80%; if you're doing something highly specific, it'll be closer to 20%.
if you give an LLM a spec with a new language and no examples, it can't write the new language.
until someone does that, I think we've demonstrated that they do not have understanding or abstract thought. they NEED examples in a way humans do not.
Have you tried that? It generally doesn't go so well.
In this example there are several commits where you can see they needed to fix the code because they couldn't get (teach) the LLM to generate the required code.
And there's no memory there, you open a new prompt and it's forgotten everything you said previously.
No, I was not making a critique on its effectiveness at generating usable results. I was responding to what I've seen in several other articles here arguing towards anthropomorphism.
A college hackathon organizer here (and former high school hackathon organizer). This is so useful. Dealing with high school/university administration takes up an incredible amount of time. When it came to reimbursements, my university took months to pay back student organizers who paid out of their own pocket to buy last minute items.
This is so great. As someone who used to run a Hack Club, it's so exciting to see this (would have solved a lot of problems for me back in high school).
Out of curiosity, are there ever plans to expand this beyond high school?
Potentially. We want to start by focusing on high school events, eventually start to provide more general financial services to clubs, and later may consider doing sponsorship for collegiate events.
I'm guessing it must be close. One of the bubbles is "Stuart", and Stuart Scott of ESPN is trending on Twitter quite heavily today after passing of cancer.
Senior at the International Academy here (a public high school in Michigan). One of my classes uses Google Classroom.
I think that Google truly took a "Google Plus" look and morphed that into a website for teachers. I love how it has a clean and easy to use UI. The assignments are on a side bar, announcements are clearly displayed in a "stream". You can easily post a message to your class, and you have access to everyone's email (making it easy to communicate).
Teachers in my school have used three platforms. Moodle, Google Classroom, and Edmodo. Quite frankly, I have come to like Edmodo much more. They're identical to Facebook, but the layout is so much better than the alternatives (in my preference). Edmodo has truly thought everything out.
The Madison Block[1] has a lot to do with the growth startup-wise in Downtown Detroit. Also, there are more and more meetups going on. For example, Detroit Soup[2] is a growing event and calls people to join, pay $5 for soup, and vote on a project that will benefit the city (people present their projects). Even MHacks, the premier college hackathon run by students from the University of Michigan, was held in the Quicken Loans headquarters this spring. I have also been to a few meetups such as one from the Detroit NodeJS group, and the Detroit Drones group- the meetup groups are active and growing.
The city has had its ups and downs, but things are on the rise. Take it from someone who has lived near the city for 17 years. I think there are a lot of assumptions about Detroit, but there are a few things that are true: We have a 14.5% unemployment rate, a post industrial culture (culture of employees expecting to work for the Big Three immediately, instead of a culture of entrepreneurs), a financial problem, and poor Detroit schools (however the schools in suburbs are much better).
I think that the combination of a lot of opportunity, and recent support from people like Dan Gilbert gave Detroit a huge start with the Madison Block, and that is the reason that people look at Detroit and what is happening in the city now.
Hmm, definitely didn't mean to submit it if it was already done in the past. I thought that if it was submitted previously then hacker news would automatically detect that.