Hacker Newsnew | past | comments | ask | show | jobs | submit | CrendKing's commentslogin

I believe most of the time this phrase is said to an inexperienced artisan who has no idea how the current system works, what's the shortcoming of it, and how to improve upon it. Think of an undergraduate student who tries to solve the Goldbach conjecture. Usually what ended up is either he fails to reinvent the wheel, or reinvent the exact same wheel, which has no value. The phrase certainly does not apply to professionals.


Even then, you know what's a good way to learn about how the current system works etc, maybe even the best way? I've got many failed projects behind me, and 0 regrets.


Why can't Cargo have a system like PyPI where library author uploads compiled binary (even with their specific flags) for each rust version/platform combination, and if said binary is missing for certain combination, fallback to local compile? Imagine `cargo publish` handle the compile+upload task, and crates.io be changed to also host binaries.


> Why can't Cargo have a system like PyPI where library author uploads compiled binary

Unless you have perfect reproducible builds, this is a security nightmare. Source code can be reviewed (and there are even projects to share databases of already reviewed Rust crates; IIRC, both Mozilla and Google have public repositories with their lists), but it's much harder to review a binary, unless you can reproducibly recreate it from the corresponding source code.


I don’t think it’s that much of a security nightmare: the basic trust assumption that people make about the packaging ecosystem (that they trust their upstreams) remains the same whether they pull source or binaries.

I think the bigger issues are probably stability and size: no stable ABI combined with Rust’s current release cadence means that every package would essentially need to be rebuilt every six weeks. That’s a lot of churn and a lot of extra index space.


> remains the same whether they pull source or binaries.

I don't think that's exactly true, it's definitely _easier_ to sneak something into a binary without people noticing than it is to sneak it into rust source, but there hasn't been an underhanded rust competition for a while so I guess it's hard to be objective about that.


Pretty much nobody does those two things at the same time:

- pulling dependencies with cargo - auditing the source code of the dependencies they're building

You are either censoring and vetting everything or you're using dependencies from crates.io (ideally after you've done your due diligence on the crate), but should crates.io be compromised and inject malware in the crates' payload, I'm ready to bet nobody would notice for a long time.

I fully agree with GP that binary vs source code wouldn't change anything in practice.


> Pretty much nobody does those two things at the same time: - pulling dependencies with cargo - auditing the source code of the dependencies they're building

Your “pretty much” is probably weaseling you out of any criticism here, but I fully disagree:

My IDE (rustrover) has “follow symbol” support, like every other IDE out there, and I regularly drill into code I’m calling in external crates. Like, just as often as my own code. I can’t imagine any other way of working: it’s important to read code you’re calling to understand it, regardless of whether it’s code made by someone else in the company, or someone else in the world.

My IDE’s search function shows all code from all crates in my dependencies. With everything equal regardless of whether it’s in my repo or not. It just subtly shades the external dependencies a slightly different color. I regularly look at a trait I need from another crate, and find implementations across my workspace and dependencies, including other crates and impls within the defining crate. Yes, this info is available on docs.rs but it’s 1000x easier to stay within my IDE, and the code itself is available right there inline, which is way more valuable than docs alone.

I think it’s insane to not read code you depend on.

Does this mean I’m “vetting” all the code I depend on? Of course not. But I’m regularly reading large chunks of it. And I suspect a large chunk of people work the way I do; There are a lot of eyeballs on public crates due to them being distributed as source, and this absolutely has a tangible impact on supply chain attacks.


You answer your own argument here:

> Does this mean I’m “vetting” all the code I depend on? Of course not.

Inspecting public facing parts of the code is one thing, finding nasty stuff obfuscated in a macro definition or in a Default or Debug implementation of a private type that nobody is ever going to check outside of auditors is a totally different thing.

> My IDE (rustrover) has “follow symbol” support

I don't know exactly how it works for RustRover, since I know Jetbrain has reimplemented some stuff on their own, but if it evaluates proc macros (like rust-analyzer) does, then by the time you step into the code it's too late, proc macros aren't sandboxed in any ways and your computer could be compromised already.


The point of my argument is not to say I’m vetting anything, but to say that there are tons of eyeballs on crates today, because of the fact that they are distributed as source and not a binary. It’s not a silver bullet but every little bit helps, every additional eyeball makes hiding things harder.

The original claim is that “pretty much no one” reads any of their dependencies, in order to support a claim that they should be distributed as binaries, meaning “if there was no source available at all in your IDE, it wouldn’t make a difference”, which is just a flatly wrong claim IMO.

A disagreement may be arising here about the definition of “audit” vs “reading” source code, but I’d argue it doesn’t matter for my point, which is that additional eyeballs matter for finding issues in dependencies, and seeing the source of your crates instead of a binary blob is essential for this.


> The original claim is that “pretty much no one” reads any of their dependencies,

No the claim is that very few people read the dependencies[1] enough to catch a malicious piece of code. And I stand by it. “Many eyeballs” is a much weaker guarantee when people are just doing “go to definition” from their code (for instance you're never gonna land on a build.rs file this way, yet they are likely the most critical piece of code when it comes to supply chain security).

[1] (on their machines, that is if you do that on github it doesn't count since you have no way to tell it's the same code)


> No the claim is that very few people read the dependencies[1] enough to catch a malicious piece of code.

You’re shifting around between reading enough to catch any issue (which I could easily do if a vulnerability was right there staring at me when I follow symbol) to catching all issues (like your comment about build.rs.) Please stick with one and avoid moving goal posts around.

There exists a category of dependency issues that I could easily spot in my everyday reading of my dependencies’ source code. It’s not all of them. Your claim is that I would spot zero of them, which is overly broad.

You’re also trying to turn this into a black-or-white issue, as if to say that if it isn’t perfect (ie. I don’t regularly look at build.rs), it isn’t worth anything, which is antithetical to good security. The more eyeballs the better, and the more opportunities to spot something awry, the better.


I'm not moving the goal post, a supply chain attack is an adversarial situation it is not about spotting an issue occurring at random, it is about spotting an issue specially crafted to avoid detection. So in practice you are either able to spot every kind of issues, or none of the relevant ones because if there's one kind that reliably slips through, then you can be certain that the attacker will focus on this kind and ignore the trivial to spot ones.

If anything, having access to the source code gives you an illusion of security, which is probably the worse place to be in.

The worse ecosystem when it comes to supply chain attacks is arguably the npm one, yet there anyone can see the source and there are almost two orders of magnitude more eyeballs.


In such an environment I’m doomed anyway, even if I’m vetting code. I don’t understand why the goal has to be “the ability to spot attacks specifically designed to prevent you from detecting.” For what you’re describing, there seems to be no hope at all.

It’s like if someone says “don’t pipe curl into bash to install software”, ok that may or may not be good advice. But then someone else says “yeah, I download the script first and give it a cursory glance to see what it’s doing”, wouldn’t you agree they’re marginally better off than the people who just do it blindly?

If not, maybe we just aren’t coming from any mutual shared experience. It seems flatly obvious to me that being able to read the code I’m running puts me in a better spot. Maybe we just fundamentally disagree.


> It’s like if someone says “don’t pipe curl into bash to install software”, ok that may or may not be good advice. But then someone else says “yeah, I download the script first and give it a cursory glance to see what it’s doing”, wouldn’t you agree they’re marginally better off than the people who just do it blindly?

I don't agree with your comparison, in this case it's more like downloading, then running it without having read it and then every once in a while look at a snippet containing a feature that interest you.

The comparison to “download the script and read it before you run it” would be to download the crate's repo, read it and then vendor the code you've read to use as a dependency, which is what I'd consider proper vetting (in this case the attacker would need to be much more sophisticated to avoid detection, it's still possible but in this case at least you've actually gained something), but it's a lot more work.


If you have reproducible builds it's no different. Without those binaries are a nightmare in that you can't easily link a given binary back to a given source snapshot. Deciding to trust my upstream is all well and good but if it's literally impossible to audit them that's not a good situation to be in.


I think it’s already probably a mistake to think that a source distribution consistently references a unique upstream source repository state; I don't believe the crate distribution layout guarantees this.

(I agree that source is easier to review and establish trust in; the observation is that once you read the upstream source you’re in the same state regarding distributors, since build and source distributions both modify the source layout.)


No stable ABI doesn't mean the ABI changes at every release though.


It might as well. If there is no definition of an ABI, nobody is going to build the tooling and infrastructure to detect ABI compatibility between releases and leverage that for the off-chance that e.g. 2 out of 10 successive Rust releases are ABI compatible.


Why wouldn't they do exactly that if they decided to publish binary crates…

Nobody does that right now because there's no need for that, but it doesn't mean that it's impossible in any way.

Stable ABI is a massive commitment that has long lasting implications, but you don't need that to be able to have binary dependencies.


You can have binary dependencies with a stable ABI; they're called C-compatible shared libs, provided by your system package manager. And Cargo can host *-sys packages that define Rust bindings to these shared libs. Yes, you give up on memory safety across modules, but that's what things like the WASM Components proposals are for. It's a whole other issue that has very little to do with ensuring safety within a single build.


Yet other ecosystems handle it just fine, regardless of security concerns, by having signed artifacts and configurable hosting as an option.


> Unless you have perfect reproducible builds

Or a trusted build server doing the builds. There is a build-bot building almost every Rust crate already for docs.rs.


docs.rs is just barely viable because it only has to build crates once (for one set of features, one target platform etc.).

What you propose would 1) have to build each create for at least the 8 Tier 1 targets, if not also the 91 Tier 2 targets. That would be either 8 or 99 binaries already.

Then consider that it's difficult to anticipate which feature combinations a user will need. For example, the tokio crate has 14 features [1]. Any combination of 14 different features gives 2^14 = 16384 possible configurations that would all need to be built. Now to be fair, these feature choices are not completely independent, e.g. the "full" feature selects a bunch of other features. Taking these options out, I'm guessing that we will end up with (ballpark) 5000 reasonable configurations. Multiply that by the number of build targets, and we will need to build either 40000 (Tier 1 only) or 495000 binaries for just this one crate.

Now consider on top that the interface of dependency crates can change between versions, so the tokio crate would either have to pin exact dependency versions (which would be DLL hell and therefore version locking is not commonly used for Rust libraries) or otherwise we need to build the tokio crate separately for each dependency version change that is ABI-incompatible somewhere. But even without that, storing tens of thousands of compiled variants is very clearly untenable.

Rust has very clearly chosen the path of "pay only for what you use", which is why all these library features exist in the first place. But because they do, offering prebuilt artifacts is not viable at scale.

[1] https://github.com/tokio-rs/tokio/blob/master/tokio/Cargo.to...


You could get a lot of benefit from a much smaller subset. For example, just the "syn" crate with all features enabled on tier 1 targets (so ~8 builds total) would probably save a decent chunk off almost everybody's build.


It runs counter to Cargos curreat model where the top-level workspace has complete control over compilation, including dependencies and compiler flags. I've been floating an idea of "opaque dependencies" that are like python depending on C libraries or a C++ library dependening on a dynamic library.


That would work for debug builds (and that's something that I would appreciate) but not for release, as most of the time you want to compile for the exact CPU you're targeting not just for say “x86 Linux” to make sure your code is optimized properly using SIMD instructions.


A trustworthy distributed cache would also work very well for this in practice. Cargo works with sccache. Using bazel + rbe can work even better.


Maybe I'm dumb, but I can't see how the code in the "Errors and map" section can compile. "transform_list" returns a Result<>, yet "result" is just a Vec. I thought you always need to wrap it with Ok()? Is that a new nightly feature?


No, it seems like an oversight in the code sample. It should be wrapped in Ok()


But Edition can exist only because Rust intrinsically has the concept of package, which naturally defines the boundary. C++ has nothing. How do you denote a.cpp be of cpp_2017 edition which b.cpp be cpp_2026? Some per-file comment line at top of each file?

C++ is a mess in that it has too much historic baggage while trying to adapt to a fiercely changing landscape. Like the article says, it has to make drastic changes to keep up, but such changes will probably kill 80% of its target audiences. I think putting C++ in maintenance mode and keep it as a "legacy" language is the way to go. It is time to either switch to Rust, or pick one of its successor languages and put effort into it.


Rust doesn't have the concept of package. (Cargo does, but Cargo is a different thing from Rust, and it's entirely possible to use Rust without Cargo).

Rust has the concept of _crate_, which is very close to the concept of compilation unit in C++. You build a crate by invoking `rustc` with a particular set of arguments, just as you build a compilation unit by invoking `g++` or `clang++` with a particular set of arguments.

One of these arguments defines the edition, for Rust, just like it could for C++.


That only works for C++ code using C++20 modules (i.e. for approximately nothing). With textual includes, you need to be able to switch back and forth the edition within a single compilation unit.


It's not clear that modules alone will solve One Definition Rule issues that you're describing. It's actually more likely that programs will have different object files building against different Built Module Interfaces for the same module interface. Especially for widely used modules like the standard std one.

But! We'll be able to see all the extra parsing happen so in theory you could track down the incompatibilities and do something about them.


Modules are starting to come out. They have some growing pains, but they are now ready for early adopters and are looking like they will be good. I'm still in wait and see mode (I'm not an early adopter), but so far everything just looks like growing pains that will be solved and then they will take off.


At the current rate, we'll have full module support for all of the most popular C++ libraries sometime around Apr 7th, 2618.

https://arewemodulesyet.org/


I expect modules to follow a S curve of growth. Starting in about 2 years projects will start to adopt in mass, and over the next 5-10 years there will be fast growth and then (in about 12 years!) on a few stragglers will not use modules. They are not free to adopt but there appear to be a lot of long term savings from paying the price.


I'll mention that library maintainers/authors can't even _consider_ modules unless they set C++20 as a requirement. Many/most popular libraries will not do that anytime soon. I maintain a moderately-popular library and my requirement is C++11... now, to be fair, I started it back in 2016-2017; but still, I won't even consider requiring C++20 until C++17-and-earlier application code is close to disappearing.


Mixing editions in a file happens in Rust with the macro system. You write a macro to generate code in your edition and the generation happens in the callers crate, no matter what edition it is.


> I think putting C++ in maintenance mode and keep it as a "legacy" language is the way to go

I agree but also understand this is absolutely wishful thinking. There is so much inertia and natural resistance to change that C++ will be around for the next century barring nuclear armageddon.


I don't think even that would suffice. :)


Cobol's still around. Just because a language exists doesn't mean that we have to keep releasing updated specifications and compiler versions rather than moving all those resources to better languages.


COBOL's most recent standard was released in 2023, which rather ruins your point.


I think the existence of COBOL-2023 actually suggests that it's not merely possible that in effect C++ 26 is the last C++ but that maybe C++ 17 was (in the same sense) already the last C++ and we just didn't know it.

After all doubtless COBOL's proponents did not regard COBOL-85 as the last COBOL - from their point of view COBOL-2002 was just a somewhat delayed further revision of the language that people had previously overlooked, surely now things were back on track. But in practice yeah, by the time of COBOL-2002 that's a dead language.


Fully agree, because for the use cases of being a safer C, and keeping stuff like LLVM and GCC running, that is already good enough.

From my point of view C++26 is going to be the last one that actually matters, because too many are looking forward to whatever reflection support it can provide, otherwise that would be C++23.

There is also the whole issue that past C++17, all compilers seem like a swiss cheese in language support for the two following language revisions.


> I think putting C++ in maintenance mode and keep it as a "legacy" language is the way to go

That is not possible. The the following function in C++ std::vector<something> doSomething(std::string); Simple enough, memory safe (at least the interface, who knows what happens inside), performant, but how do you call that function from anything else? If you want to use anything else with C++ it needs to speak C++ and the means vector and string needs to interoperate.


You can interoperate via C ABI and just not use the C++ standard types across modules - which is the sane thing to do. Every other language that supports FFI via C linkage does this, only C++ insists on this craziness.


Also I wouldn't start by rewriting the thing that calls do_something, I'd start by rewriting do_something. Calling into rust from c++ using something like zngur lets you define rust types in c++ and then call idiomatic rust. You can't do it in the opposite direction because you cannot safely represent all c++ types in rust, because some of them aren't safe.


I have millions of lines of C++. do_something exists and is used but a lot of those lines and works well. I have a new feature that needs to call do_something. I'm not rewriting any code. My current code base was a rewrite of previous code into C++ started before rust existed), and it costs a nearly a billion dollars! I cannot go to my bosses and say that expensive rewrite that is only now starting to pay off because of how much better our code is needs to be scrapped. Maybe in 20 years we can ask for another billion (adjust for inflation) to rewrite again, but today either I write C++, or I interoperate with existing C++ with minimal effort.

I'm working on interoperation with existing C++. It is a hard problem and so far every answer I've found means all of our new features still needs to be written in C++ but now I'm putting in a framework where that code could be used by non-C++. I hope in 5 years that framework is in place by enough that early adopters can write something other than C++ - only time will tell though.


Yeah that use case is harder, but I'm involved in a similar one. Our approach is to split off new work as a separate process when possible and do it entirely in rust. You can call into c++ from rust, it just means more unsafe code in rust wrapping the c++ that has to change when you or your great grandchild finally do get around to writing do_something in rust. I am super aware of how daunting it is, especially if your customer base isn't advocating for the switch. Which most don't care until they get pwned and then they come with lawyers. Autocxx has proven a painful way to go. The chrome team has had some input to stuff and seem to be making it better.


Sure I can do that - but my example C++ function is fully memory safe (other than don't go off the end of the vector which static rules can enforce by banning []). If I make a C wrapper I just lost all the memory safety and now I'm at higher risk. Plus the effort to build that wrapper is not zero (though there are some generators that help)


How about going off the end of the vector with an iterator, or modifying the vector while iterating it, or adding to the vector from two different threads or reading from one thread while another is modifying it or [...].

There is nothing memory safe whatsoever about std::vector<something> and std::string. Sure, they give you access to their allocated length, so they're better than something[] and char* (which often also know the size of their allocations, but refuse to tell you).


> going off the end of the vector with an iterator,

The point of an iterator is to make it hard to do that. You can, but it is easy to not do that.

> modifying the vector while iterating it

Annoying, but in practice I've not found it hard to avoid.

> adding to the vector from two different threads or reading from one thread while another is modifying it

Rust doesn't help here - they stop you from doing this, but if threads are your answer rust will just say no (or force you into unsafe). Threads are hard, generally it is best to avoid this in the first place, but in the places where you need to modify data from threads Rust won't help.


> rust will just say no

This is just not accurate, you can use atomic data types, Mutex<> or RwLock<> to ensure thread-safe access. (Or write your own concurrent data structures, and mark them safe for access from a different thread.) C++ has equivalent solutions but doesn't check that you're doing the right thing.


Only if using a hardened runtime with bounds checking enabled, without any calls to c_str().


I don't understand the strategy from the election spammer here. They clearly want the recipient to vote/donate. But if someone spends time to type "STOP" instead of ignoring, he has clearly read the message and not interested in the solicitation. Then why spam this particular number? The more flood the more he hates you, the less he's gonna to anything positive to you. It is a extremely low quality "live number" for the campaign, and waste of money.


> They clearly want the recipient to vote/donate

Lol, no.

Just like SEO and the e/mail spam nobody actually cares about the result. The process itself, 'engagement' - is the target.

Just imagine, you are the spammer and you are presenting the numbers to whoever who ordered this:

"Hey, we sent 1000 SMS, got 900 STOP replies, that's good, right? That would cost you $1000 for the SMS and $100 for our time. "

And compare it with this:

"Hey, we sent 15043401 SMS, got 9000 replies, overall engagement is over 78% and we can continue for another week or two! BTW it's $15043401 for the SMS and $100000 for our time"

Which variant the spammer would prefer?


Maybe it makes sense to whoever they outsourced the job to


I've been using AI to solve isolated problems, mainly as a replacement of search engine specifically for programming. I'm still not convinced of these "write whole block of code for me" type of use case. Here's my arguments against the videos from the article.

1. Snake case to camelCase. Even without AI we can already complete these tasks easily. VSCode itself has command of "Transform to Camel Case" for selection. It is nice the AI can figure out which text to transform based on context, but not too impressive. I could select one ":, use "Select All Occurrences", press left, then ctrl+shift+left to select all the keys.

2. Generate boilerplate from documentation. Boilerplate are tedious, but not really time-consuming. How many of you spend 90% of time writing boilerplate instead of the core logic of the project? If a language/framework (Java used to be, not sure about now) requires me to spend that much time on boilerplate, that's a language to be ditched/fixed.

3. Turn problem description into a block of concurrency code. Unlike the boilerplate, these code are more complicated. If I already know the area, I don't need AI's help to begin with. If I don't know, how can I trust the generated code to be correct? It could miss a corner case that my question didn't specify, which I don't yet know existing myself. In the end, I still need to spend time learning Python concurrency, then I'll be writing the same code myself in no time.

In summary, my experience about AI is that if the question is easy (e.g. easy to find exactly same question in StackOverflow), their answer is highly accurate. But if it is a unique question, their accuracy drops quickly. But it is the latter case where we spend most of the time on.


I started like this. Then I came around and can’t imagine going back.

It’s kinda like having a really smart new grad, who works instantly, and has memorized all the docs. Yes I have to code review and guide it. That’s an easy trade off to make for typing 1000 tokens/s, never losing focus, and double checking every detail in realtime.

First: it really does save a ton of time for tedious tasks. My best example is test cases. I can write a method in 3 minutes, but Sonnet will write the 8 best test cases in 4 seconds, which would have taken me 10 mins of switching back and forth, looking at branches/errors, and mocking. I can code review and run these in 30s. Often it finds a bug. It’s definitely more patient than me in writing detailed tests.

Instant and pretty great code review: it can understand what you are trying to do, find issues, and fix them quickly. Just ask it to review and fix issues.

Writing new code: it’s actually pretty great at this. I needed a util class for config that had fallbacks to config files, env vars and defaults. And I wanted type checking to work on the accessors. Nothing hard, but it would have taken time to look at docs for yaml parsing, how to find the home directory, which env vars api returns null vs error on blank, typing, etc. All easy, but takes time. Instead I described it in about 20 seconds and it wrote it (with tests) in a few seconds.

It’s moved well past the stage “it can answer questions on stack overflow”. If it has been a while (a while=6 months in ML), try again with new sonnet 3.5.


>My best example is test cases. I can write a method in 3 minutes, but Sonnet will write the 8 best test cases in 4 seconds

For me it doesn't work. Generated tests fail to run or they fail.

I work in large C# codebases and in each file I have lots of injected dependencies. I have one public method which can call lots of private methods in the same class.

AI either doesn't properly mock the dependencies, either ignores what happens in the private methods.

If I take a lot of time guiding it where to look, it can generate unit tests that pass. But it takes longer than if I write the unit tests myself.


For me it's the same. It's usually just some hallucinated garbage. All of these LLM's don't have the full picture of my project.

When I can give them isolated tasks like convert X to Y, create a foo that does bar it's excellent, but for unit testing? Not even going to try anymore. I write 5 unit tests manually that work in the time I write 5 prompts that give me useless stuff that I need to add manually.

Why can't we have a LLM cache for a project just like I have a build cache? Analyze one particular commit on the main branch very expensively, then only calculate the differences from that point. Pretty much like git works, just for your model.


"It's usually just some hallucinated garbage. All of these LLM's don't have the full picture of my project."

Cursor can have whole project in the context, or you can specify specific files that you want.


> Cursor can have whole project in the context

Depends on the size of the project. You can’t shove all of google’s monorepo into an LLMs context (yet)


I’m looking at 150000 lines of Swift divided over some local packages and the main app, excluding external dependencies


Do you have 150000 lines of Swift in YOUR context window?


I know how to find the context I need, being aided by the IDE and compiler. So yes, my context window contains all of the code in my project, even if it's not instantaneous.

It's not that hard to have an idea of what code is defined where in a project, since compilers have been doing that for over half a century. If I'm injecting protocols and mocks into a unit test, it shouldn't be really hard for a computer to figure out their definitions, unless they don't exist yet and I was not clear they should have been created, which would mean that I'm giving the AI the wrong prompt and the error is on my side.


> Why can't we have a LLM cache for a project just like I have a build cache? Analyze one particular commit on the main branch very expensively

It's not just very expensive - it's prohibitively expensive, I think.


With Cursor you can specify which files it reads before starting. Usually have to attached one or two to get an ideal one-shot result.

But yeah, I use it for unit testing, not integration testing.


Ask Cursor to write usage and mocking documentation for the most important injected dependencies, then include that documentation in your context. I’ve got a large tree of such documentation in my docs folder specifically for guiding AI. Cursor’s Notebook feature can bundle together contexts.

I use Cursor to work on a Rust Qt app that uses the main branch of cxx-qt so it’s definitely not in the training data, but Claude figures out how to write correct Rust code based on the included documentation no problem, including the dependency injection I do through QmlEngine.


Sounds interesting, what are you working on?

(Fellow Qt developer)


Same thing: https://news.ycombinator.com/item?id=40740017 :)

Just saw you published your block editor blog post. Look forward to reading it!


Haha, hi again!

Awesome! Would love to hear your thoughts. Any progress on your AI client? I'm intrigued by the so many bindings to Qt. Recently, I got excited about a Mojo binding[1].

[1] https://github.com/rectalogic/mojo-qt


I’ve found it better at writing tests because it tests the code you’ve written vs what you intended. I’ve caught logic bugs because it wrote tests with an assertion for a conditional that was backwards. The readable name of the test clearly pointed out that I was doing the wrong thing (the test passed?.)


Interesting. I’ve had the opposite experience (I invert or miss a condition, it catches it).

It probably comes down to model, naming and context. Until Sonnet 3.5 my experience was similar to yours. After it mostly “just works”.


That sounds more like a footgun than a desirable thing to be honest!


Maybe a TLDR from all the issues I'm reading in this thread:

- It's gotten way better in the last 6 months. Both models (Sonnet 3.5 and new October Sonnet 3.5), and tooling (Cursor). If you last tried Co-pilot, you should probably give it another look. It's also going to keep getting better. [1]

- It can make errors, and expect to do some code review and guiding. However the error rates are going way way down [1]. I'd say it's already below humans for a lot of tasks. I'm often doing 2/3 iterations before applying a diff, but a quick comment like "close, keep the test cases, but use the test fixture at the top of the file to reduce repeated code" and 5 seconds is all it takes to get a full refactor. Compared to code-review turn around with a team, it's magic.

- You need to learn how to use it. Setting the right prompts, adding files to the context, etc. I'd say it's already worth learning.

- I just knows the docs, and that's pretty invaluable. I know 10ish languages, which also means I don't remember the system call to get an env var in any of them. It does, and can insert it a lot faster than I can google it. Again, you'll need to code review, but more and more it's nailing idiomatic error checking in each language.

- You don't need libraries for boilerplate tasks. zero_pad is the extreme/joke example, but a lot more of my code is just using system libraries.

- It can do things other tools can't. Tell it to take the visual style of one blog post and port to another. Take it to use a test file I wrote for style reference, and update 12 other files to follow that style. Read the README and tests, then write pydocs for a library. Write a GitHub action to build docs and deploy to GitHub pages (including suggesting libraries, deploy actions, and offering alternatives). Again: you don't blindly trust anything, you code review, and tests are critical.

[1] https://www.anthropic.com/news/3-5-models-and-computer-use


Yes, it works for new code and simple cases. If you have large code bases, it doesn't have the context and you have to baby it, telling which files and functions it should look into before attempting to write something. That takes a lot of time.

Yes, it can do simple tasks, like you said, writing a call to get the environment variables.

But imagine you work on a basket calculation service, where you have base item prices, where you have to apply some discounts based on some complicated rules, you have to add various kinds of taxes for various countries in the world and you have to use a different number of decimals for each country. Each of your classes calls 5 to 6 other classes, all with a lot of business logic behind. Besides that, you also make lots of API calls to other services.

What will the AI do for you? Nothing, it will just help you write one liners to parse or split strings. For everything else it lacks context.


Are you suggesting you would inline all that logic if you hand-rolled the method? Probably not, right? You would have a high-level algorithm of easily-understood parts. Why wouldnt the AI be able to 1) write that high-level algorithm and then 2) subsequently write the individual parts?


What's the logic here? "I haven't seen it so it doesn't exist?"

There are hundreds of available examples of it processing large numbers of files, and making correct changes across them. There are benchmarks with open datasets already linked in the thread [1]. It's trivial to find examples of it making much more complex changes than "one liners to parse or split strings".

[1] https://huggingface.co/datasets/princeton-nlp/SWE-bench


> Instant and pretty great code review: it can understand what you are trying to do, find issues, and fix them quickly. Just ask it to review and fix issues.

Cursor’s code review is surprisingly good. It’s caught many bugs for me that would have taken a while to debug, like off by one errors or improperly refactored code (like changing is_alive to is_dead and forgetting to negate conditionals)


> changing is_alive to is_dead and forgetting to negate conditionals

No test broke?


Tests don’t care what you name the variable


This “really smart new grad” take is completely insane to me, especially if you know how LLMs work. Look at this SQL snippet Claude (the new Sonnet) generated recently.

    -- Get recipient's push token and sender's username
    SELECT expo_push_token, p.username 
    INTO recipient_push_token, sender_username
    FROM profiles p
    WHERE p.id = NEW.recipient_id;

Seems like the world has truly gone insane and engineers are tuned into some alternate reality a la Fox News. Well…it’ll be a sobering day when the other shoe falls.


> it can understand

It can't understand. That's not what LLMs do.


This is a prompt I gave to o1-mini a while ago: My instructions follow now. The scripts which I provided you work perfectly fine. I want you to perform a change though. The image_data.pkl and faiss_index.bin are two databases consisting of rows, one for each image, in the end, right? My problem is that there are many duplicates: images with different names but the same content. I want you to write a script which for each row, i.e. each image, opens the image in python and computes the average expected color and the average variation of color, for each of the colors red, green and blue, and over "random" over all the pixels. Make sure that this procedure is normalized with respect to the resolution. Then once this list of "defining features" is obtained, we can compute the pairwise difference. If two images have less than 1% variation in both expectation and variation, then we consider them to be identical. in this case, delete those rows/images, except for one of course, from the .pkl and the .bin I mentioned in the beginning. Write a log file at the end which lists the filenames of identical images.

It wrote the script, I ran it and it worked. I had it write another script which displays the found duplicate groups so I could see at a glance that the script had indeed worked. And for you this does not constitute any understanding? Yes it is assembling pieces of code or algorithmic procedures which it has memorized. But in this way it creates a script tailored to my wishes. The key is that it has to understand my intent.


Does "it understands" just mean "it gave me what I wanted?" If so, I think it's clear that that just isn't understanding.

Understanding is something a being has or does. And understanding isn't always correct. I'm capable of understanding. My calculator isn't. When my calculator returns a correct answer, we don't say it understood me -- or that it understands anything. And when we say I'm wrong, we mean something different from what we mean when we say a calculator is wrong.

When I say LLMs can't understand, I'm saying they're no different, in this respect, from a calculator, WinZip when it unzips an archive, or a binary search algorithm when you invoke a binary-search function. The LLM, the device, the program, and the function boil down (or can) to the same primitives and the same instruction set. So if LLMs have understanding, then necessarily so do a calculator, WinZip, and a binary-search algorithm. But they don't. Or rather we have no reason to suppose they do.

If "it understands" is just shorthand for "the statistical model and program were designed and tuned in such a way that my input produced the desired output," then "understand" is, again, just unarguably the wrong word, even as shorthand. And this kind of shorthand is dangerous, because over and over I see that it stops being shorthand and becomes literal.

LLMs are basically autocorrect on steroids. We have no reason to think they understand you or your intent any more than your cell phone keyboard does when it guesses the next character or word.


When I look at an image of a dog on my computer screen, I don't think that there's an actual dog anywhere in my computer. Saying that these models "understand" because we like their output is, to me, no different from saying that there is, in fact, a real, actual dog.

"It looks like understanding" just isn't sufficient for us to conclude "it understands."


I think the problem is our traditional notions of "understanding" and "intelligence" fail us. I don't think we understand what we mean by "understanding". Whatever the LLM is doing inside, it's far removed from what a human would do. But on the face of it, from an external perspective, it has many of the same useful properties as if done by a human. And the LLM's outputs seem to be converging closer and closer to what a human would do, even though there is still a large gap. I suggest the focus here shouldn't be so much on what the LLM can't do but the speed at which it is becoming better at doing things.


I think there is only one thing we should focus on: Measurable capability on tasks. Understanding, memorization, reasoning etc. are all just shorthands we use to quickly convey an idea of a capability on a kind of task. Measurable capability on tasks can also attempt do describe mechanistically how the model works, but that is very difficult. This is where you would try to describe your sense of "understanding" rigorously. To keep it simple for example, I think when you say that the LLM does not understand what you must really mean is that you reckon its performance will quickly decay off as the task gets more difficult in various dimensions: Depth/complexity, Verifiability of the result, length/duration/context size, to a degree where it is still far from being able to act as a labor-delivering agent.


Brains can’t understand either that’s not what neurons do


We experience our own minds and we have every reason to think that our minds are a direct product of our brains.

We don't have any reason to think that these models produce being, awareness, intention, or experience.


What is the best workflow to code with an AI?

Copy and paste the code to the Claude website? Or use an extension? o something else?


Cursor. Mostly chat mode. Usually adding 1-2 extra files to the context before invoking, and selecting the relevant section for extra focus.


I personally use copilot, which is integrated into my IDE, almost identical to this Cursor example.


Copilot is about as far away from Cursor with Claude as the Wright Brothers' glider is to the Saturn V.


Not based on the link, I didn't see anything in that text that I can't do with copilot or which looked better to me than what copilot outputs.


Does Copilot do multi-file edits now?


Copilot Editor is a beta feature that can perform multi-file edits.


Another fun example from yesterday: pasted a blog post in markdown into a HTML comment. Selected it and told sonnet to convert it to HTML using another blog post as a style reference.

Done in 5 seconds.


And how do you trust that it didn't just alter or omit some sentences from your blog post?

I just use Pandoc for that purpose and it takes 30 seconds, including the time to install pandoc. For code generation where you'll review everything, AI makes sense; but for such conversion tasks, it doesn't because you won't review the generated HTML.


> it takes 30 seconds, including the time to install pandoc

On some speedrunning competition maybe? Just tested on my work machine, `sudo apt-get pandoc` took 11 seconds to complete, and it was this fast only because I already had all the depndencies installed.

Also I don't think you'll be able to fulfill the "using another blog post as a style reference" part of GP's requirements - unless, again, you're some grand-master Pandoc speedrunner.

Sure, AI will make mistakes with such conversion tasks. It's not worth it if you're going to review everything carefully anyway. In code, fortunately, you don't have to - the compiler is doing 90% of the grunt work for you. In writing, depends on context. Some text you can eyeball quickly. Sometimes you can get help from your tool.

Literally yesterday I back-ported a CV from English to Polish via Word's Translation feature. I could've done it by hand, but Word did 90% of it correctly, and fixing the remaining issues was a breeze.

Ultimately, what makes LLMs a good tool for random conversions like these is that it's just one tool. Sure, Pandoc can do GP's case better (if inputs are well-defined), but it can't do any of the 10 other ad-hoc conversions they may have needed that day.


Installing pandoc is basically a one-time cost that is amortized over its uses, so... why worry about it?

Relying on the compiler to catch every mistake is a pretty limited strategy.


> Installing pandoc is basically a one-time cost that is amortized over its uses, so... why worry about it?

Because space of problems LLMs of today solve well with trivial prompts is vast, far greater than any single classical tool covers. If you're comparing solutions to 100 random problems, you have to count in those one-time costs, because you'll need to use some 50-100 different tools to get through them all.

> Relying on the compiler to catch every mistake is a pretty limited strategy.

No, you're relying on the compiler to catch every mistake than can be caught mechanically - exactly the kind of things humans suck at. It's kind of the entire point of errors and warnings in compilers, or static typing for that matter.


No, if you are having an LLM generate code that you are not reviewing, you are relying on the compiler 100%. (Or the runtime, if it isn't a compiled language.)


Who said I'm not reviewing? Who isn't reviewing LLM code?


Re:trust. It just works using Sonnet 3.5. It's gained my trust. I do read it after (again, I'm more a code reviewer role). People make mistakes too, and I think it's error rate for repeititve tasks is below most people's. I also learned how to prompt it. I'd tell it to just add formatting without changing content in the first pass. Then in a separate pass ask it to fix spelling/grammar issues. The diffs are easy to read.

Re:Pandoc. Sure, if that's the only task I used it for. But I used it for 10 different ones per day (write a JSON schema for this json file, write a Pydantic validator that does X, write a GitHub workflow doing Y, add syntax highlighting to this JSON, etc). Re:this specific case - I prefer real HTML using my preferred tools (DaisyUI+tailwind) so I can edit it after. I find myself using a lot less boilerplate-saving libraries, and knowing a few tools more deeply.


Why are you comparing its error rate for repetitive tasks with most people? For such mechanical tasks we already have fully deterministic algorithms to do it, and the error rate of these traditional algorithms is zero. You aren't usually asking a junior assistant to manually do such conversion, so it doesn't make sense to compare its error rate with humans.

Normalizing this kind of computer errors when there should be none makes the world a worse place, bit by bit. The kind of productivity increase you get from here does not seem worthwhile.


The OP said they had it use another HTML page as a style reference. Pandoc couldn't do that. Just like millions of other specific tasks.


That's just a matter of copying over some CSS. It takes the same effort as copying the output of AI so that's not even taking extra time.


Apply the style of B to A is not deterministic, nor are there prior tools that could do it.


You didn't also factor in the time to learn Pandoc (and to relearn it if you haven't used it lately). This is also just one of many daily use cases for these tools. The time it takes to know how to use a dozen tools like this adds up when an LLM can just do them all.


This is actually how I would use AI: if I forgot how to do a conversion task, I would ask AI to tell me the command so that I can run it without rejiggering my memory first. The pandoc command is literally one line with a few flags; it's easily reviewable. Then I run pandoc myself. Same thing with the multitude of other rarely used but extremely useful tools such as jq.

In other words, I want AI to help me with invoking other tools to do a job rather than doing the job itself. This nicely sidesteps all the trust issues I have.


I do that constantly. jq's syntax is especially opaque to me. "I've got some JSON formatted like <this>. Give me a jq command that does <that>.

Google, but better.


This


> And how do you trust that it didn't just alter or omit some sentences from your blog post?

How do you trust a human in the same situation? You don't, you verify.


What? Is this a joke? Have you actually worked with human office assistants? The whole point of human assistants is that you don't need to verify their work. You hire them with a good wage and you trust that they are working in good faith.

It's disorienting for me to hear that some people are so blinded by AI assistants that they no longer know how human assistants behave.


It appears op has a different experience. Each human assistant is different.


I agree.

I replaced SO with cGPT and it’s the only good case I found. Finding an answer I build onto. But outsourcing my reflexion ? That’s a dangerous path. I tried on small projects to do that, building a project from scratch with cursor just to test it. Sometimes it’s right on spot but in many instances it misses completely some cases and edge cases. Impossible to trust blindly. And if I do so and not take proper time to read and think about the code the consequences pile up and make me waste time in the long run because it’s prompt over prompt over prompt to refine it and sometimes it’s not exactly right. That messes up my thinking and I prefer to do it myself and use it as a documentation on steroids. I never used google and SO again for docs. I have the feeling that relying on it to much to write even small blocs of code will make us loose some abilities in the long run and I don’t think that’s a good thing. Will companies allow us to use AI in code interviews for boilerplate ?


The AI's are to a large degree trained on tutorial code, quick examples, howto's and so on from the net. Code that really should come with a disclamer note: "Dont use in production, only example code.".

This leads to your code being littered with problematic edge-cases that you still have to learn how to fix. Or in worst case you don't even notice that there are edge cases because you just copy-pasted the code and it works for you. The edge cases your users will find with time.


AI is trained on all open source code. I’m pretty sure that’s a much larger source of training data than web tutorials.


Isn't tutorial-level code exactly the best practices that everyone recommends these days? You know, don't write clever code, make things obvious to juniors, don't be a primadonna but instead make sure you can be replaced by any recently hired fresh undergrad, etc.? :)


Not really. For example, tutorial code will often leave out edge cases so as to avoid confusing the reader: if you're teaching a new programmer how to open a file, you might avoid mentioning how to handle escaping special characters in the filename.


Don't forget about Little Bobby Tables! These types of tutorials probably killed the most databases over time.


Which makes me wonder, if old companies with a history of highly skilled teams would train local models, how better would they be at helping solve new complex problems.


They kinda already are - source code for highly complex open source software is already in the training datasets. The problem is that tutorials are much more descriptive (why the code is doing something, how does this particular function work etc. -down to a level of a single line of code), which probably means it’s much easier to interpret for llm-s, therefore weighted higher in responses.


I’m slightly worried that these AI tools will hurt language development. Boilerplate heavy and overly verbose languages are flawed. Coding languages should help us express things more succinctly, both as code writers and as code readers.

If AI tools let us vomit out boilerplate and syntax, I guess that sort of helps with the writing part (maybe. As long as you fully understand what the AI is writing). But it doesn’t make the resulting code any more understandable.

Of course, as is always the case, the tools we have now are the dumbest they’ll ever be. Maybe in the future we can have understandable AI that can be used as a programming language, or something. But AI as a programming language generator seems bad.


I used to agree with this, but the proliferation of Javascript made me realize that newer/better programming languages were already not coming to save us.


Maybe it's a rectangle between:

   seniors
   copilots
   juniors
   new languages
Wondering. Since the seniors pair with LLMs, world needs much less juniors. Some juniors will go away to other industries, but some might start projects in new languages without LLM/business support.

Frankly, otherwise I don't see how any new lang corpus might get created.


Before you dismiss all of this because "You could do it by hand just as easily", you should actually try using Cursor. It only takes a few minutes to setup.

I'm only 2 weeks in but it's basically impossible for me to imagine going back now.

It's not the same as GH Copilot, or any of the other "glorified auto-complete with a chatbox" tools out there. It's head and shoulders better than everything else I have seen, likely because the people behind it are actual AI experts and have built numerous custom models for specific types of interactions (vs a glorified ChatGPT prompt wrapper).


> I could select one ":, use "Select All Occurrences"

Only if it's the same occurrences. Cursor can often get the idea of what you want to do with the whole block of different names. Unless you're a vim macro master, it's not easily doable.

> How many of you spend 90% of time writing boilerplate instead of the core logic of the project?

It doesn't take much time, but it's a distraction. I'd rather tab through some things quickly than context switch to the docs, finding the example, adapting it for the local script, then getting back to what I was initially trying to do. Working memory in my brain is expensive.


Disagree.

I still spend a good amount of time on boilerplate. Stuff that's not thinking hard about the problem I'm trying to solve. Stuff like units tests, error logging, naming classes, methods and variables. Claude is really pretty good at this, not as good as the best code I've read in my career but definitely better than average.

When I review sonnets code the code is more likely to be correct than if I review my own. If I make a mistake I'll read what I intended to write, and not what I actually wrote. Where as when I review sonnets there's 2 passes so the chance an error slips through is smaller.


Unit tests are boiler plate ?


I'm using an expansive definition of boiler plate to be sure. But like boiler plate most unit tests require a little bit of thought and then a good amount of typing, doing things like setting up the data to test, mocking up methods, writing out assertions to test all your edge cases.

I've found sonnet and o1 to be pretty good at this. Better than writing the actual code because while modifying a system requires a lot of context of the overall application and domain, unit testing a method usually doesn't.


Yes. You write a function ApplyFooToBar(), and then unit tests that check that, when supplied with right Foos, the function indeed applies those Foos to the Bar. It's not a very intellectually challenging work.

If anything, the challenge is with all the boilerplate surrounding the test, because you can't just write down what the test checks themselves - you need to assemble data, assemble expected result, which you end up DRY-ing into support modules once you have 20 tests needing similar pre-work, and then there's lots of other bullshit to deal with at the intersection between your programming language, your test framework, and your modularization strategy.


Quite often, yes. That's why I prefer integration tests.


Indeed. To many tests are just testing nothing other than mocks. That goes for my coworkers directly and for their Copilot output. They’re not useful tests, they are thing to catch actual errors, they’re maybe useful as usage documentation. But in general, they’re mostly a waste.

Integration tests, good ones, are harder but far more valuable.


> To many tests are just testing nothing other than mocks

Totally agree, and I find that they don't help with documentation much either, because the person that wrote it doesn't know what they're trying to test. So it only overcomplicates things.

Also harmful because it gives a false sense of security that the code is tested when it really isn't.


This has been my approach in the past that only certain parts of the code are worth unit testing. But given how much easier unit tests are to write now with AI I think the % of code worth unit testing has gone up.


> But given how much easier unit tests are to write now with AI I think the % of code worth unit testing has gone up.

I see the argument, I just disagree with it. Test code is still code and it still has to be maintained, which, sure "the AI will do that" but now theres a lot more that I have to babysit.

The tests that I'm seeing pumped out by my coworkers who are using AI for it just aren't very good tests a lot of the time, and honestly encode too much of the specific implementation details of the module in question into them, making refactoring more of a chore.

The tests I'm talking about simply aren't going to catch any bugs, they weren't used as an isolated execution environment for test driven development, so what use are they? I'm not convinced, not yet anyway.

Just because we can get "9X%" coverage with these tools, doesn't mean we should.


Completely agree. I find it fails miserably at business logic, which is where we spend most of our time on. But does great at generic stuff, which is already trivial to find on stack overflow.


This might be a promoting issue, my experience is very different, I’ve written entire services using it.


Might be that they work in a much more complex codebase, or a language/framework/religion that has less text written on it. Might also be that they (are required to) hold code to a higher standard than you and can't just push half-baked slop to prod.


I've spent a good amount of time in my career reading high quality code and slop. The difference is not some level of intelligence that sonnet does not possess. It's a well thought out design, good naming, and rigor. Sonnet is as good if not better than the average dev at most of this and with a some good prompting and a little editing can write code as good as most high quality open source projects.

Which is usually far higher than most commerical apps the vast majority of us devs work on.


> with a some good prompting and a little editing can write code good

I agree with a good developer "baby-sitting" the model it's capable of producing good code. Although this is more because the developer is skilled at producing good code so they can tell an AI where it should refactor and how (or they can just do it themselves). If you've spent significant time refitting AI code, it's not really AI code anymore its yours.

Blindly following an AI's lead is where the problem is and this is where bad to mediocre developers get stuck using an AI since the effort/skill required to take the AI off its path and get something good out is largely not practised. This is because they don't have to fix their own code, and what the AI spits out is largely functional - why would anyone spend time thinking about a solution that works that they don't understand how they arrived at?


I've spent soo much time in my life reviewing bad or mediocre code from mediocre devs and 95% of the time the code sonnet 3.5 generates is at least as correct and 99% of the time more legible than what a mediocre dev generates.

It's well commented, the naming is great it rarely tries to get overly clever, it usually does some amount of error handling, it'll at least try to read the documentation, it finds most of the edge cases.

That's a fair bit above a mediocre dev.


It's easy to forget one major problem with this: we all have been mediocre devs at some point in our lives -- and there will always be times when we're mediocre, even with all our experience, because we can't be experienced in everything.

If these tools replace mediocre devs, leaving only the great devs to produce the code, what are we going to do when the great devs of today age out, and there's no one to replace them with, because all those mediocre devs went on to do something else, instead of hone their craft until they became great devs?

Or maybe we'll luck out, and by the time that happens, our AIs will be good enough that they can program everything, and do it even better than the best of us.

If you can call that "lucking out" -- some of us might disagree.


i love those hot takes because the market will historically fire you and hire the mediocre coders alone now.


Without knowing much about my standards and work you’ve just assumed it’s half baked slop. You’re wrong.


IA generated content is the definition of slop for most people.


> IA generated content is the definition of slop

The irony.


https://en.wikipedia.org/wiki/Slop_(artificial_intelligence)

English is not my mother tonge. I have never noticed the word "slop" until people started to use it to talk about AI generated content.

So for many people around the world slop = AI content.

Where is the irony if I may ask?


You misspelled AI - that's the extent of the irony


In my mother tongue is called IA, inteligecia artificial. I mix it up all the time.


you didn't have to explain. he knew, my friend. he knew.


Which is deeply sad, because this both tarnishes good output and gives a free pass to the competitor - shit "content" generated by humans. AI models only recently started to match that in quantity.

But then again, most of software industry exists to create and support creation of human slop - advertising, content marketing, all that - so there's bound to be some double standards and salary-blindness present.


Without knowing much about my prompts and work you’ve just assumed it’s why AI gives me bad results. You’re wrong. (Can you see why this is a bad argument?)

Don't get me wrong I love sloppy code as much as the next cowboy, but don't delude yourself or others when the emperor doesn't have clothes on.


> But does great at generic stuff, which is already trivial to find on stack overflow.

The major difference is that with Cursor you just hit "tab", and that thing is done. Vs breaking focus to open up a browser, searching SO, finding an applicable answer (hopefully), translating it into your editor, then reloading context in your head to keep moving.


The benefits of exploring is finding alternatives and knowing about gotchas. And knowing more about both the problem spaces and how the language/library/framework solves it.


I've had o1 respond with a better alternative before. And it's only going to get better.


I mean sure, but this is exactly the argument against using a calculator and doing all your math by hand also.


There was a thread about that and gist was: A calculator is a great tool because it's deterministic and the failures are known (mostly related to precision); It eliminates the entire need of doing computation by hand and you don't have to babysit it though the computation process with "you're an expert mathematician..."; Also it's just a tool and you still need to learn basic mathematics to use it.

The equivalent to that is a good IDE that offers good navigation (project and dependencies), great feedback (highlighting, static code analysis,..), semantic manipulation, integration with external tooling, and the build/test/deploy process.


Yes, I think agree. And when you use a calculator and get a result that doesn't make sense to you, you step in as the human and try and figure out what went wrong.

With the calculator, it's typically a human error that causes the issue. With AI, it's an AI error. But in practice it's not a different workflow.

Give inputs -> some machine does work much faster than you could -> use your human knowledge to verify outputs -> move forward or go back to step 1.


My experience has been different. My major use case for AI tools these days is writing tests. I've found that the generated test cases are very much in line with the domain. It might be because we've been strictly using domain driven design principles It even generates test cases that fail to show what we've missed


Have you had a go with the o1 range of models?


Yesterday, I got into an argument on the internet (shocking, I know), so I pulled out an old gravitation simulator that I had built for a game.

I had chatGPT give me the solar system parameters, which worked fine, but my simulation had an issue that I actually never resolved. So, working with the AI, I asked it to convert the simulation to constant-time (it was currently locked to render path -- it's over a decade old). Needless to say, it wrote code that set the simulation to be realtime ... in other words, we'd be waiting one year to see the planets go around the sun. After I pointed that out, it figured out what to do and still got things wrong or made some terrible readability decisions. I ended up using it as inspiration instead and then was able to have the simulation step at one second resolution (which was required for a stable orbit) but render at 60fps and compress a year into a second.


This sums up my experience as well. You can get an idea or just a direction from it, but itself AI stumbles upon its own legs instantly in any non-tutorial task. Sometimes I envy and at the same time feel sorry for successful AI-enabled devs, cause it feels like they do boilerplate and textbook features all day. What a release if something can write it for you.


I have a corporation-sponsored subscription to Github CoPilot + Rider

When I'm writing unit tests or integration tests it can guess the boilerplate pretty well.

If I already have a AddUserSucceeds test and I start writing `public void Dele...` it usually fills up the DeleteUserSucceeds function with pretty good guesses on what Asserts I want there - most times it even guesses the API path/function correctly because it uses the whole project as context.

I can also open a fresh project I've never seen and ask "Where is DbContext initialised" and it'll give me the class and code snippet directly.


Have you tried recently to start a new web app from scratch? Specially the integration of frontend framework with styling and the frontend backend integration.

Oh my god get ready to waste a full weekend just to setup everything and get a formatted hello world.


That’s why I use Rails for work. But I also had to write a small Nodejs project (vite/react + express) recently for a private project, and it has a lot of nice things going for it that make modern frontend dev really easy - but boy is it time consuming to set up the basics.


I can't imagine having nice things to say about node after working in Rails. Rails does so much for you, provides a coherent picture of how things work. Node gives you nothing but the most irritating programming tools around


I thought node was great until I had to upgrade some projects, and then realizing that those frameworks maintainers never maintain their dependencies. While in the C world, lot of projects treat warnings as errors.


that's an indictment of the proliferation of shitty frameworks and documentation. it's not hard to figure out such a combination and then keep a template of it lying around for future projects. you don't have to reach for the latest and shiniest at the start of every project.


> you don't have to reach for the latest and shiniest at the start of every project.

Except you kind of do, because if you're working frontend or mobile, then your chosen non-shitty tech stack is probably lacking some Important Language Innovations or Security Featuers that Google or Microsoft forced on the industry since last time you worked with that stack.

(Yes, that's mostly just an indictment of the state of our industry.)


every time you capitulate, you tell them that you're happy to play along, bring more "innovation" so you keep having to run very hard just to stay in place.


I’ve been pretty happy with vite, chakra, and Postgrest lately


Most frontend frameworks come with usable templates. Setting up a new Vite React project and getting to a formatted hello world can be done in half an hour tops.


Half an hour is still an overestimation, most of these frontend tools go from 0 to hello world in a single CLI command.


On a good day, when you're using the most recent right version of MacOS, when all of the frontend tool's couple thousand dependencies isn't transiently incompatible with everything else, yes.

(If no, AI probably won't help you here either. Frontend stuff moves too fast, and there's too much of it. But then perhaps the AI model could tell you that the frontend tool you're using is a ridiculous overkill for your problem anyway.)


I'll be honest, I've never had the command line tool to setup a React / NextJS / Solid / Astro / Svelt / any framework app fail to make something that runs, ever


I had create-react-app break for me on the first try, because I managed to luck into some transient issue with dependencies.


What exactly magic command line tool are you referring to? What cmd tool configures the frontend framework to use a particular css framework, webpack to work with your backend endpoints properly, setups cors and authentication with the backend for development, configures the backend to point to and serve the spa?


dotnet new <the_template>


It takes me all of 5 minutes with Phoenix


Exactly, the idea that it would take a weekend seems crazy to me. It’s certainly not something I need AI for.


Yep, it will likely work and do what it's supposed to do. But what it's supposed to do is probably only 90% of what you want: try to do anything out of what the boilerplate is setup for and you're in for hours of pain. Want SWC instead of TSC? Eslint wasn't setup, or not like you want? Prettier isn't integrated with ESlint? You want Typescript project references? Something about ES modules vs CJS? Hours and hours and hours of pain.

I understand all this stuff better than the average (although not a top 10%), and I'd be ashamed to put a number of the amount of hours I've lost on setting up boilerplate _even_ having used some sort of official generator


> Boilerplate are tedious, but not really time-consuming.

In the aggregate, almost no programmer can think up code faster than they can type it in. But being a better typist still helps, because it cuts down on the amount you have to hold in your head.

Similar for automatically generating boilerplate.

> If I don't know, how can I trust the generated code to be correct?

Ask the AI for a proof of correctness. (And I'm only half-joking here.)

In languages like Rust the compiler gives you a lot of help in getting concurrency right, but you still have to write the code. If the Rust compiler approves of some code (AI generated or artisanally crafted), you are already pretty far along in concurrency right.

A great mind can take a complex problem and come up with a simple solution that's easy to understand and obviously correct. AI isn't quite there yet, but getting better all the time.


> In the aggregate, almost no programmer can think up code faster than they can type it in.

And thank god! Code is a liability. The price of code is coming down but selling code is almost entirely supplanted by selling features (SaaS) as a business model. The early cloud services have become legacy dependencies by now (great work if you can get it). Maintaining code is becoming a central business concern in all sectors governed by IT (i.e. all sectors, eating the world and all that).

On a per-feature basis, more code means higher maintenance costs, more bugs and greater demands on developer skills and experience. Validated production code that delivers proven customer value is not something you refactor on a whim (unless you plan to go out of business), and the fact that you did it in an evening thanks to ClippyGPT means nothing—-the costly part is always what comes after: demonstrating value or maintaing trust in a competitive market with a much shallower capital investment moat.

Mo’ code mo’ problems.


> In the aggregate, almost no programmer can think up code faster than they can type it in. But being a better typist still helps, because it cuts down on the amount you have to hold in your head.

I mean on the big picture level sure they can. Or in detail if it is something that they have good experience with. In many cases I get a visual of the whole code blocks, and then if I use copilot I can already predict what it is going to auto complete for me based on the context and then I can pretty much in a second know if it was right or wrong. Of course it is more so for the side projects since I know exactly what I want to do and so it feels most of the time it is having to just vomit all the code out. And I feel impatient, so copilot helps a lot with that.


100%. Useful cases include

* figuring out how to X in an API - eg "write method dl_file(url, file) to download file from url using requests in a streaming manner"

* Brainstorming which libraries / tools / approaches exist to do a given task. Google can miss some. AI is a nice complement for Google.


I don’t even trust the API based exercises anymore unless it’s a stable and well documented API. Too many times I’ve been bitten by an AI mixing and matching method signatures from different versions, using outdated approaches, mixing in apis from similar libraries, or just completely hallucinating a method. Even if I load the entire library docs into the context, I haven’t found one that’s completely reliable.


It just has to be popular and common boilerplate like the example I gave.

It's hard with less popular APIs. It will almost always get something wrong. In such cases, I read docs, search sourcegraph / GitHub, and finally check the source code.


> Snake case to camelCase > VSCode itself has command of "Transform to Camel Case"

I never understand arguments like this. I have no idea what the shortcut for this command is. I could learn this shortcut, sure, but tomorrow I’ll need something totally different. Surely people can see the value of having a single interface that can complete pretty much any small-to-medium-complexity data transformation. It feels like there’s some kind of purposeful gaslighting going on about this and I don’t really get the motive behind it.


Exactly. I think some commenters are taking this example too literally. It's not about this specific transformation, but how often you need to do similar transformations and don't know the exact shortcut or regex or whatever to make it happen. I can describe what I want in three seconds and be done with it. Literal dropbox.png going on in this thread.


If you aren’t using AI for everything, you’re using it wrong. Go learn how to use it better. It’s your job to find out how. Corporations are going to use it to replace your job.

(Just kidding. I’m just making fun of how AI maxis reply to such comments, but they do it more subtly.)


Boilerplate comes up all the time when writing Erlang with OTP behaviors though, and sometimes you have no idea if it really is the right way or not. There are Emacs skeletons for that (through tempo), but feels like they are sometimes out of date.


1. Is such a taste task for me anyway that I don’t lose much just doing it by hand

2. The last time I wrote boilerplate heavy Java code, 15+ years ago, the IDE already generated most of it for me. Nowadays boilerplate comes in two forms for me: new project setup, which I find it far quicker to use a template or just copy and gut an existing project (and it’s not like I start new projects that often anyway), or new components that follow some structure, where AI might actually be useful but I tend to just copy an existing one and gut it.

3. These aren’t tasks I really trust AI for. I still attempt to use AI for them, but 9 out of 10 times come away disappointed. And the other 1 time end up having to change a lot of it anyway.

I find a lot of value from AI, like you, asking it SO style questions. I do also use it for code snippets, eg “do this in CSS”. Its results for that are usually (but not always) reasonably good. I also use it for isolated helper functions (write a function to flood fill a grid where adjacent values match was a recent one). The results for this range from a perfect solution first try, to absolute trash. It’s still overall faster than not having AI, though. And I use it A LOT for rubber ducking.

I find AI is a useful tool, but I find a lot of the positive stories to be overblown compared to my experience with it. I also stopped using code assistants and just keep a ChatGPT tab open. I sometimes use Claude but it’s conversation length limits turned me off.

Looking at the videos in OP, I find the parallelising task to be exactly the kind of tricky and tedious task that I don’t trust AI to do, based on my experience with that kind of task, and with my experience with AI and the subtly buggy results it has given me.


Have you tried Cursor, or is this just your guess at what your evaluation would be?


Don't mean to be rude, but was this comment written with an LLM?


have you tried using cursor or claude?


1. The richest 1% vote whoever makes them even richer, at the expense of all the other 99% poorer than them. 2. The other 99% people no longer play the "democracy" game with the rich, form their own government without the "voting power corresponds to how much tax paid". 3. The rich people country loses its foundation, thus can no longer sustain. The rich join the poor people country.


I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.


This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.


I appreciate your work on pep 420; I’ve benefited from it personally as a Python user. Thank you for a job well done.


Thank you for the kind words.


JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.

I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.


> JavaScript tooling requires index files for everything

This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools


What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.


> JavaScript tooling requires index files for everything

You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.


They aren't necessarily empty.


The page 8 ("Building") of the slides has the badger picture to the right. His right hand has some weird "nails". Another example of AI-generated image.


And the file served is of pdf file. The font used is Titillium Web. The color of my bike shed is that of brown wood color.

Did I also contribute to the discussion of the tool?


All the honey badger (mascot of cosmopolitan libc) images were generated using FLUX.1-schnell on my MacBook Pro M1 Max.

I've blogged about all the AI usage at https://www.qt.io/blog/examples-of-local-llm-usage


I completely disagree. I personally find dark mode harmful to my eyes. Try this: open a page full of white text on black background. Stare at it for 10 seconds. Now close your eyes. You can clearly see the afterimages of those bright text lingering for quite a while. Now try this with black on white page. No afterimage at all.

Like the author says, if the screen is too bright for you, lower the brightness, or use night light mode. You can change your environment. You can't change how your eyes/brain function.


No, with black text on white background I get an afterimage of the entire screen.

I don't want to force you to use dark mode, by all means continue viewing your world with dark text on bright backgrounds and let us view things the way that we prefer them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: