Hacker Newsnew | past | comments | ask | show | jobs | submit | frankc's commentslogin

Personally, I have been using beads for a few days on a couple of projects. I also like https://github.com/Dicklesworthstone/beads_viewer which is a nice tui for beads (with some additional workflow i haven't tried). I have found its been useful for longer, multi-session implementations. Its easier to get back into the work. I wouldn't go so far as to it couldn't do the work without it, but so far it seems smoother. These things are hard to measure. I think the it's really not that different than how an engineering team would use jira but more hierarchical, which helps preserve context, and with prebuilt instructions for how the agent should use it.


The skills that matter most to me are the ones I create myself (with the skill creator skill) that are very specific and proprietary. For instance, a skill on how to write a service in my back-testing framework.

I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.


I think that is both pretty true but massively underrated in how much faster you can solve the problems you know how to solve. I do also help it finds helps me more quickly learn how to solve new problems, but I must still must learn how to solve these new problems I have it solve those new problems or things go off the rails.


I actually think the social media factor is the biggest reason...we can now compare ourselves to a much, much large circle which makes our relative standing seem much worse. I think relative standing affects our happiness much more than absolute. From a mate competition standpoint, that actually makes logical sense.


Participating in social media is a choice. I get that it’s an unpopular opinion it you can really keep it to a minimum and still interact with people. Of course I’m not a teenager anymore so it might be more pressure for them.


Well, but so is smoking cigarettes, drinking or gambling. As a society we recognize that those vices are both an individual choice and that some people really really really struggle with them. So again, as society we try to put at least some disincentives to all of those activities, while not outright forbidding them.


We are all going to need to have personal passwords/safe words we don't reveal to untrusted parties for authentication. Or maybe personal retinal scanners? I think personal auth might be an interesting startup to get ahead of this.


So they’ll call you first with the fake video of your mom.

You’ll be suspicious and ask for the pass phrase. The attacker now knows the nature of the protection you setup between you and your mom.

And then the real attack on your mom, with you describing the system you’d agreed to, and claiming you can’t remember the word/phrase.

Better is the Terminator-style lie to see if it gets detected.


Mom-in-the-middle attack


So far I am in the skeptic camp on this. I don't see it adding a lot of value to my current claude code workflow which already includes specialized agents and a custom mcp to search indexed mkdocs sites that effectively cover the kinds of things I would include in these skills file. Maybe it winds up being a simpler, more organized way to do some of this, but I am not particularly excited right now.

I also think "skills" is a bad name. I guess its a reference to the fact that it can run scripts you provide, but the announcement really seems to be more about the hierarchical docs. It's really more like a selective context loading system than a "skill".


I'm inclined to agree. I've read through the Skill docs and it looks like something I've been doing all along - though I informally referred to it as the "Table of Contents" approach.

Over time I would systematically create separate specialized docs around certain topics and link them in my CLAUDE.md file but noticeably without using the "@" symbol which to my understanding always causes CLAUDE to ingest the linked files resulting in unnecessarily bloating your prompt context.

So my CLAUDE md file would have a header section like this:

  # Documentation References

  - When adding CSS, refer to: docs/ADDING_CSS.md
  - When adding or incorporating images, refer to: docs/ADDING_IMAGES.md
  - When persisting data for the user, refer to: docs/STORAGE_MANAGER.md
  - When adding logging information, refer to: docs/LOGGER.md
It seems like this is less of a breakthrough and more an iterative improvement towards formalizing this process from a organizational perspective.


How consistently do you find that Claude Code follows your documentation references? Like you work on a CSS feature and it goes to ADDING_CSS.md? I run into issues where it sometimes skips my imperative instructions.


It's funny you mention this - for a while I was concerned that CC wasn't fetching the appropriate documentation related to the task at hand (coincidentally this was around Aug/Sept when Claude had some serious degradation issues [1]), so I started adding the following to the beginning of each specialized doc file:

  When this documentation is read, please output "** LOGGING DOCS READ **" to the console.

These days I do find that the TOC approach works pretty well though I'll probably swap them over to Skills to see if the official equivalent works better.

[1] https://www.anthropic.com/engineering/a-postmortem-of-three-...


For me, it’s pretty reliable until a chat grows too long and it drifts too far away from the start where it reviewed the TOC


I just tag all the relevant documentation and reference code at the beginning of the session


That's exactly what it is - formalizing and creating a standard induces efficiency. Along with things like AGENTS.md, it's all about standardization.

What bugs me: if we're optimizing for LLM efficiency, we should use structured schemas like JSON. I understand the thinking about Markdown being a happy medium between human/computer understanding but Markdown is non-deterministic for parsing. Highly structured data would be more reliable for programmatic consumption while still being readable.


In general, markdown refers to CommonMark and derivatives now. I’d be surprised if that wasn’t the case here.


> and a custom mcp to search indexed mkdocs sites that effectively cover the kinds of things I would include in these skills file

Search and this document base pattern are different. In search the model uses a keyword to retrieve results, here the model starts from a map of information, and navigates it. This means it could potentially keep context better, because search tools have issues with information fragmentation and not seeing the big picture.


if you've ever worked with Excel + Python, I think this example will drive home the value a bit:

https://github.com/anthropics/skills/blob/main/document-skil...

There are many edge cases when writing / reading Excel files with Python and this nails many of them.


I manually select my context* (like a caveman) and clear it often. I feel like I have a bit more control and grounding this way.

*I use a TUI to manage the context.


Which TUI do you use to manage context?


The thing is that I don't use AI to replace things I can do deterministically with code. I use it to replace things I cannot do deterministically with code - often something I would have a person do. People are also fallible and can't be completely trusted to do the thing exactly right. I think it works very well for things that have a human in the loop, like codeing agents where someone needs to review changes. For instance, I put an agent in a tool for generating aws access policies from english descriptions or answering questions about current access (where they agent has access to tools to see current users, buckets policies etc). I don't trust the agent to do it exactly right so it just proposes the policies and I have to accept or modify them before they are applied, but its still better than writing them myself. And it's better than having a web interface do it because that is lacking context.

I think it's a good example of the kind of internal tools the article is talking about. I would not have spent the time to build this without claude making it much faster to build stand-alone projects and I would not have the agent to do the english -> policy output with LLMs.


>> The thing is that I don't use AI to replace things I can do deterministically with code. I use it to replace things I cannot do deterministically with code - often something I would have a person do.

Nailed it. And the thing is, you can (and should) still have deterministic guard rails around AI! Things like normalization, data mapping, validations etc. protect against hallucinations and help ensure AI’s output follows your business rules.


> Things like normalization, data mapping, validations etc. protect against hallucinations

And further downstream: Audit trails, human sign-offs, operations which are reversible or have another workflow for making compensating actions to fix it up.


Or, you could make a tool that can generate this stuff deterministically every time the exact same way. At least with that situation you can audit the tool and see if it is correct or not. You still leave the point of failure on the user in your situation, even higher because they could get complacent with the llm output and assume it is correct or mistakenly think it is correct.

In my mind you are trading potentially a function that always evaluates the same for a given f(x) for one that might not evaluate the same and requires oversight.


So how would you implement "generating aws access policies from english descriptions" using deterministic code in a way that doesn't require human oversight?


> I think it works very well for things that have a human in the loop, like codeing agents where someone needs to review changes

This is the best case for AI, it's not very different from the level 3 autonomous car with driver in the loop instead of fully autonomous level 5 vehicle that probably requires AGI level of AI.

The same applies to medicine where limited number specialists (radiologist/cardiologist/oncologist/etc) in the loop are being assisted by AI for activities that probably require too much time for experts manually looking at laborious evidences especially for non-obvious early symptom detection (X-ray/ECG/MRI) for the modern practice of evidence based medicine.


Demand is a function of price. At any given price there is a quantity demanded. To know the price you also need to know the supply function. There is a demand for socks but if no one is will to supply socks for less a million dollars, the quantity demanded of socks at that price could be 0.


In most markets seems like commodity level items and services like baggers and socks would have plenty of supply.


In a free market, commodity-level jobs will only happen at commodity-level wages. But it's not a free market, because of minimum wage laws. If you set the minimum wage too high, the number of baggers may go to zero, not because of supply but because of demand - the price is higher than any grocery store is willing to pay.

You can argue about the need for the minimum wage laws. You can argue about the morality of paying a living wage. But that's a different argument.


I also feel like what gets lost in this is not everything you are building is a bite size feature in large existing project. Sometimes you are adding an entire subsystem that is large to something relatively greenfield. if you broke that down into features, you will need 20 PRs and if you wait for review, or even don't wait but have to circle back to integrate lots of requested changes, what might be a couple of weeks of work turns into 2 to 3 months of work. That just does not work unless you are in a massive enterprise that is ok with moving like molasses. Do you wind up with something not as high quality? Probably. But that is just the trade-off with shipping faster.


If you are the only developer who ever going to work on something, maybe. Even then, I will argue you are more likely to deliver successfully if you are cutting your work into smaller pieces instead of not delivering anything at all for weeks at a time.

But for the company, having two people capable of working on a system is better than one, and usually you want a team. Which means the code needs to be something your coworkers understand, can read and agree with. Those changes they ask for aren't frivolous: they are an important part of building software collaboratively. And it shouldn't be that much feedback forever: after you have the conversation and you understand and agree with their feedback, the next time you can take that consideration into account when you are first writing the code.

If you want to speed that process up, you can start by pair programming and hashing out disagreements in real time, until you get confident you are mostly on the same page.

Professional programming isn't about closing tickets as fast as possible. It is about delivering business value as a team.


I don't know exactly where AI is going to go, but the fact that I keep seeing this one study of programmer productivity, with like 16 people with limited experience with cursor,uncritically quoted over and over assures me that the anti-AI fervor is at least as big of a bubble as the AI bubble itself.


Do you have some other, better study in mind that people should be talking about? My sense is that comparative data is scarce, because leaders who believe in AI tend to believe in it so strongly that they mandate adoption rather than collecting data from organic growth.


Obviously, more studies would be better, and that one study certainly isn't conclusive, but for now it is pretty much what is _available_.


The joy of this particular study (the METR one) is that it maps pretty well to a lot of professional programming work, hence why it gets reported a lot.


To be fair when you look at most studies that are hemm and hawwed over in the press, they tend to look like that. Well controlled, high n, replicated in independent cohorts, the readership of mass market media doesn’t understand any of this nuance. The bar is merely whatever seems to validate an existing bias whatever the topic since the incentives are built around engagement and not scientific truth.


> uncritically quoted over and over assures me that the anti-AI fervor is at least as big of a bubble as the AI bubble itself.

Counterpoint: the AI fanboys and AI companies, with all their insane funding couldn't come up with a better study and bigger sample size, because LLMs simply don't help experienced developers.

What follows is that the billion dollar companies just couldn't create a better study, either because they tried and didn't like the productivity numbers not being in their favor (very likely), or because they are that sloppy and vibey that they don't know how to make a proper study (I wouldn't be surprised, see ChatGPT's latest features: "study mode" which had a blog post! and you know that the level is not very high).

Again, until there is a better study, the consensus is LLMs are 19% productivity drain for experienced developers, and if they help certain developers, then most likely those developers are not experienced.

How's that for an interpretation?


I never tell anyone they have to use AI tools. You do you. In a few years we will see who is better off.


It has already been a couple of years. What time period should we revisit? And also how would we measure success?


I mean, surely, if and when they demonstrably work, the sceptic can just adopt them, having lost nothing? There seems to be a new one every month anyway, so it’s not like experience from using the one from three years ago is going to be particularly helpful.

There seems to be an attitude, or at least a pretended attitude, amongst the true believers that the heretics are dooming themselves, left behind in a glorious AI future. But the AI coding tools du jour are completely different from the ones a year ago! And in six months they'll be different again!


LLMs have been existing for 5-6 years already in some shape and form. How long do I have to wait for Claude to actually do something and me starting it to see it in OSS?

- Cause currently what we see in OSS is LLM trash. https://www.reddit.com/r/webdev/comments/1kh72zf/open_source...

- And a large majority of users don't want that copilot trash in their default github experience: https://www.techradar.com/pro/angry-github-users-want-to-dit...

At what point that trash will become gold? 5 more years? And if it doesn't, at what point trash stays trash?

- When there is a study showing that trash is actually sapping 19% of your performance? https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

- When multiple studies show using it makes you dumb? https://tech.co/news/another-study-ai-making-us-dumb

Cause I am pretty sure NFT still has people who swear by them and say "just give it time". At what point can we confidently declare that NFTs are useless without the cultist fanbase going hurr durr? What about LLMs?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: