you really don't need any of this crap. you just need Claude Code and CLAUDE.MD in directories where you need to direct it. complicated AI set ups are mid curve
It doesn’t require any major improvement to the underlying model. As long they tinker with system prompts and builtin tools/settings, the coding agent will evolve in unpredictable ways out of my control
That's a rational argument. In practice, what we're actually doing for the most part is managing context, and creating programs to run parts of tasks, so really the system prompts and builtin tools and settings have very little relevance.
i don't understand this mcp/skill distinction? one of the mcps i use indexes the runtime dependency of code modules so that claude can refactor without just blindly grepping.
how would that be a "skill"? just wrap the mcp in a cli?
fwiw this may be a skill issue, pun intended, but i can't seem to get claude to trigger skills, whereas it reaches for mcps more... i wonder if im missing something. I'm plenty productive in claude though.
So MCPs are a bunch of, essenntially skill type objects. But it has to tell you about all of them, and information about all of them up front.
So a Skill is just a smaller granulatrity level of that concept. It's just one of the individual things an MCP can do.
This is about context management at some level. When you need to do a single thing within that full list of potential things, you don't need the instructions about a ton of other unrelated things in the context.
So it's just not that deep. It would be having a python script or whatever that the skill calls that returns the runtime dependencies and gives them back to the LLM so they can refactor without blindly greping.
no that makes no sense. the skill doesn't do anything by itself, the mcp (can be) attached to a deterministic oracle that can return correct information.
So in my nano banana image generation skill, it contains a python script that does all the actual work. The skill just knows how to call the python script.
We're attaching tools to the md files. This is at the granular level of how to hammer a nail, how to use a screw driver, etc. And then the agent, the handyman, has his tool box of skills to call depending on what he needs.
lets say i'm in erlang. you gonna include a script to unpack erlang bytecode across all active modules and look through them for a function call? oorrr... have that code running on localhost:4000 so that its a single invocation away, versus having the llm copypasta the entire script you provided and pray for the best?
But for sure, there are places it makes sense, and there are places it doesn't. I'm arguing to maximully use it for places that make sense.
People are not doing this. They are leaving the LLM to everything. I am arguing it is better to move everything possible into tools that you can, and have the LLM focus only on the bits that a program doesn't make sense for.
In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool
yeah but a skill without the mcp server is just going to be super inefficient at certain things.
again going to my example, a skill to do a dependency graph would have to do a complex search. and in some languages the dependency might be hidden by macros/reflection etc which would obscure a result obtained by grep
how would you do this with a skill, which is just a text file nudging the llm whereas the MCP's server goes out and does things.
that seems token inefficient. why have the llm do a full round trip. load the skill which contains the potentially hundreds of lines code then copy and paste the code back into the compiler when it could just run it?
not that i care too too much about small amounts of tokens but depleting your context rapidly seems bad. what is the positive tradeoff here?
I don't understand. The Skill runs the tools. In the cases there are problems where you can have programs replace the LLM, I think we should maximully do that.
That uses less tokens. The LLM is just calling the script, and getting the response, and then using that to continue to reason.
This is classic engineer trying to build a business. Indie software is more of a business than it is software. Everyone wants to do the easy part(coding/tech), nobody wants to relentlessly service customers and do marketing/distribution.
Coding is easy. Building a business is hard, whether indie or VC backed.
how many changes(% of all changes) need an entire infra stack spun up? have you tried just having the changes deployed to dev with a locking mechanism?
Does anyone get actual insightful reviews from these code review tools? From most people I've spoke with, it catches things like code complexity, linting, etc but nothing that actual relates to business logic because there's no way it could know about the business logic of the product
I built an LLM that has access to documentation before doing code reviews and forces devs to update it with each pr.
Needless to say, most see it as an annoyance not a benefit, me included.
It's not like it's useless but... people tend to hate reviewing LLM output, especially on something like docs that requires proper review (nope, an article and a product are different, an order and a delivery note are as well, and those are the most obvious..).
Code can be subpar or even gross but to the job, but docs cannot be subpar as they compound confusion.
I've even built a glossary to make sure the correct terms are used and kinda forced, but LLMs getting 95% right are less useful than getting 0, as the 5% tends to be more difficult to spot and tends to compound inaccuracies over time.
It's difficult, it really is, there's everything involved from behaviour to processes to human psychology to LLM instructing and tuning, those are difficult problems to solve unless your teams have budgets that allow you hiring a functional analyst that could double as a technical and business writer, and these figures are both rare and hard to sell to management. And then an LLM is hardly needed.
I have gotten code reviews from OoenAI's Codex integration that do point out meaningful issues, including across files and using significant context from the rest of the app.
Sometimes they are things I already know but was choosing to ignore for whatever reason. Sometimes it's like "I can see why you think this would be an issue, but actually it's not". But sometimes it's correct and I fix the issue.
I just looked through a couple of PRs to find a concrete example. I found a PR review comment from Codex pointing out a genuine big where I was not handling a particular code path. I happened to know that no production data would trigger that code path as we had migrated away from it. It acted as a prompt to remove some dead code.
we're more focused on the core extraction layer itself rather than workflow tooling. we train our own vision models for layout detection, ocr, and table parsing from scratch. the key thing for us is determinism and auditability, so outputs are reproducible run over run, which matters a lot for regulated enterprises.
reply