Hacker Newsnew | past | comments | ask | show | jobs | submit | more coder543's commentslogin

https://en.wikipedia.org/wiki/String_interning

It's an established term in software. I would speculate that the word "intern" is short for "internalizing" here.


I think it's from the Lisp "intern" function, which creates an internal symbol if it doesn't already exist and there's no external symbol of the same name.


"Intern" (job) -> staying at a workplace for some time to be trained and do some work

"Intern" (software) -> staying resident in memory to be referred to canonically

I find that often words do come about with a surprising amount of sense.


  Location: Birmingham, AL
  Remote: Yes
  Willing to relocate: Yes
  Technologies: Go, Rust, TypeScript, React, Postgres, Kafka, AWS, GCP, etc.
  Résumé/CV: https://drive.google.com/file/d/1VNC272B3n7ZEfppMHkm2wGgaINwYl4Av/view
  Email: listed on resume
I have 8+ years of experience. Mostly backend-focused full-stack engineer. I especially enjoy working on efficient, reliable, high-scale systems that have a beneficial, tangible impact.

I’ve taken about a year to explore different business ideas, but I’m very interested in getting back to a more normal job. I did get more experience with native iOS Swift+SwiftUI development and experience using LLMs in a production environment, resulting in a really polished app that I use every day. Even though I am backend focused, I can contribute to all parts of the stack.

I'm open to relocating. SoCal would be great, but I'm open to just about anywhere for the right job.


> I'm a bit surprised there was no mention of semiconductors (mostly capacitors?) going bad on 1990's hard drives.

I don’t think any kind of capacitor is a semiconductor.


Dons glasses well etymologically... the "semi-" in semiconductor means "partial", something between an insulator and a conductor. A capacitor is an insulator between two conductors.

(But this approach fails on the temperature coefficient of resistance test: capacitor ESR increases with temperature while semiconductors have a negative coefficient.)


I think the term to search for is reflink. Btrfs is one example: https://btrfs.readthedocs.io/en/latest/Reflink.html

Like with Hyperspace, you would need to use a tool that can identify which files are duplicates, and then convert them into reflinks.


I thought reflink is provided by the underlying FS, and Hyperspace is a dedup tool that finds the duplicates.


Yes. Hyperspace is finding the identical files and then replacing all but one copy with a reflink copy using the filesystem's reflink functionality.

When you asked about the filesystem, I assumed you were asking about which filesystem feature was being used, since hyperspace itself is not provided by the filesystem.

Someone else mentioned[0] fclones, which can do this task of finding and replacing duplicates with reflinks on more than just macOS, if you were looking for a userspace tool.

[0]: https://news.ycombinator.com/item?id=43173713


Hyperspace uses built in APFS features, it just applies them to existing files.

You only get CoW on APFS if you copy a file with certain APIs or tools.

If you have a program that does it manually, you copied a duplicate to somewhere on your desk from some other source, or your files already existed on the file system when you converted to APFS because you’ve been carrying them for a long time then you’d have duplicates.

APFS doesn’t look for duplicates at any point. It just keeps track of those that it knows are duplicates because of copy operations.


You can do the same with `cp -c` on macOS, or `cp --reflink=always` on Linux, if your filesystem supports it.


https://support.apple.com/en-us/102651

I think this is the better link. Advanced Data Protection is end to end encrypted, without the key being backed up to Apple’s servers.


This of course only helps if ADP is available in your country and you've turned it on.


The bigger the service, the more financial incentive they have to be smart and not pay absurd prices for things, since they can give themselves higher profit margins by controlling their costs.


How much do you think bandwidth costs? I’m trying to understand what 10x cheaper would look like to you, as an actual $/TB number.

I think a lot of people have misconceptions about how much bandwidth really costs.


I am mainly mentioning this with regards to Azure and other providers egress prices. And in Europe, onprem stuff is expensive if you are peering to other countries.


The last time I had to care professionally about bandwidth pricing for CDN price optimization in the US, wholesale bandwidth pricing was following a pattern similar to Moore’s law, with either bandwidth doubling, or price halving every 18-21 months. This was partly why you could get what looked like good deals from CDN providers for multi year contracts. They knew their prices were just going to fall. Part of what drives this is that we keep finding ways to utilize fiber, so there’s a technical aspect, but a lot of it also comes down to adding more physical connections. There’s even network consolidation happening where 2 companies will do enough data sharing that they will get peering agreements and just add a cat6 patch between servers hosted in the same datacenter and short circuit the network.

It’s been almost a decade so it’s possible things have slowed considerably, or demand has outstripped supply, but given how much data steam seems to be willing to throw at me, I know pricing is likely no where near what it was last I looked (it’s the only metered thing I regularly see and it’s downloading 10’s of GB daily for a couple games in my collection).

Using egress pricing is also the wrong metric. You’d be better off looking at data costs between regions/datacenters to get a better idea about wholesale costs, since high egress costs is likely a form of vender lockin, while higher looking at cross region avoids any “free” data costs through patch cables skewing the numbers.

Not sure about bandwidth between countries, there’s different economics there. I’d expect some self similarity there, but laying trunks might be so costly that short of finding ways to utilize fiber better is the only real way to increase supply.


Azure and the other mega clouds seem to enjoy massive profit margins on bandwidth… why would they willingly drop those prices when they can get away with high prices?

If bandwidth costs are important, there are plenty of options that will let you cut the cost by 10x (or more). Either with a caching layer like an external CDN (if that works for your application), or by moving to any of the mid-tier clouds (if bandwidth costs are an important factor, and caching won’t work for your application).

AWS, GCP, and Azure are the modern embodiment of the phrase “nobody ever got fired for buying IBM.”

Most companies don’t benefit from those big 3 mega clouds nearly as much as they think they do.

So, sure, send a note to your Azure rep complaining about the cost of bandwidth… nothing will change, of course, because companies aren’t willing to switch away from the mega clouds.

> and other providers

Other providers, like Hetzner, OVH, Scaleway, DigitalOcean, Vultr, etc., do not charge anywhere near the same for bandwidth as Azure. I think they are all about 8x to 10x cheaper.


A CDN will increase your bandwidth costs not lower it.

Eg Fastly prices: US/Europe $0.10/GB India $0.28/GB

Not all bandwidth is equal. eg Hetzner will pay for fast traffic into Europe but don't pay the premium that others like AWS do to ensure it gets into Asia uncongested.


BunnyCDN charges significantly less for data that they serve, for example.

I didn’t say all CDNs are cheaper. Some CDNs see an opportunity to charge a premium, and they do!

Fastly sees themselves as far more than just a CDN. They call themselves an “edge cloud platform”, not a CDN.

> Not all bandwidth is equal. eg Hetzner will pay for fast traffic into Europe but don't pay the premium that others like AWS do to ensure it gets into Asia uncongested.

Sure… there are sometimes tradeoffs, but for bandwidth-intensive apps, you’re sometimes (often?) better off deploying regional instances that are closer to your customers, rather than paying a huge premium to have better connectivity at a distance. Or, for CDN-compatible content, you’re probably better off using an affordable CDN that will bring your content closer to your users.

If you absolutely need to use AWS’s backbone for customers in certain geographic regions, there’s nothing stopping you from proxying those users through AWS to your application hosted elsewhere, by choosing the AWS region closest to your application and putting a proxy there. You’ll be paying AWS bandwidth plus your other provider’s bandwidth, but you’ll still be saving tons of money to route the traffic that way if those geographic regions only represent a small percentage of your users… and if they represent a large percentage, then you can host something more directly in their region to make the experience even better.

For many types of applications, having higher latency / lower bandwidth connectivity isn’t even a problem if the data transfer is cheaper and saves money… the application just needs to do better caching on the client side, which is a beneficial thing to do even for clients that are well-connected to the server.

It depends, and I am not convinced there is a one-size-fits-all solution, even if you were to pay through the nose for one of the hyperscalers.

I have plenty of professional experience with AWS and GCP, but I also have professional experience with different degrees of bare metal deployment, and experience with mid-tier clouds. If costs don’t matter, then sure, do whatever.


Or what they’re actually buying when they’re looking at the bandwidth line item on their invoices.


https://platform.openai.com/docs/models/#o1

> The latest o1 model supports both text and image inputs


But not multimodal reasoning, the intermediate and output tokens are text only, at least in the released version, they probably have actual multimodal reasoning that's not been shown yet, as they already showed gpt-4o can output image tokens,but that's not been released yet either.


That wasn’t the question… they asked if any multimodal models had been reasoning trained. o1 fits that criteria precisely, and it can reason about the image input.

They didn’t ask about a model that can create images while thinking. That’s an entirely unrelated topic.


I recommend ignoring the other reply you just got. They are clearly building a bad faith argument to try to make Go look terrible while claiming to sing its praises. That is not at all how that would look in Go. The point being made was that the exception-based code has lots of hidden gotchas, and being more explicit makes the control flow more obvious.

Something like this:

    a, err := f()
    if err != nil {
        c, err := h()
        if err != nil {
            return fmt.Errorf("h failed: %w", err)
        }
        cleanupC(c)
        return fmt.Errorf("f failed: %w", err)
    }
    defer cleanupA(a)

    b, err := a.g()
    if err != nil {
        c, err := h()
        if err != nil {
            return fmt.Errorf("h failed: %w", err)
        }
        cleanupC(c)
        return fmt.Errorf("a.g failed: %w", err)
    }
    defer cleanupB(b)

    // the rest of the function continues after here
It’s not crazy.

With Java’s checked exceptions, you at least have the compiler helping you to know (most of) what needs to be handled, compared to languages that just expect you to find out what exceptions explode through guess and check… but some would argue that you should only handle the happy path and let the entire handler die when something goes wrong.

I generally prefer the control flow that languages like Rust, Go, and Swift use.

Errors are rarely exceptional. Why should we use exceptions to handle all errors? Most errors are extremely expected, the same as any other value.

I’m somewhat sympathetic to the Erlang/Elixir philosophy of “let it crash”, where you have virtually no error handling in most of your code (from what I understand), but it is a very different set of trade offs.


Or, if you really hate duplication, you could optionally do something like this, where you extract the common error handling into a closure:

    handleError := func(origErr error, context string) error {
        c, err := h()
        if err != nil {
            return fmt.Errorf("%s: h failed: %w", context, err)
        }
        cleanupC(c)
        return fmt.Errorf("%s: %w", context, origErr)
    }

    a, err := f()
    if err != nil {
        return handleError(err, "f failed")
    }
    defer cleanupA(a)

    b, err := a.g()
    if err != nil {
        return handleError(err, "a.g failed")
    }
    defer cleanupB(b)

    // the rest of the function continues after here


No... sigmoid10 was comparing with o1 (not o1-pro), which is accessible for $20/mo, not $200/mo. So, the "Elon factor" in your math is +$20/user/month (2x) for barely any difference in performance (a hard sell), not -$160/user/month, and while we have no clear answer to whether either of them are making a profit at that price, it would be surprising if OpenAI Plus users were not profitable, given the reasonable rate limits OpenAI imposes on o1 access, and the fact that most Plus users probably aren't maxing out their rate limits anyways. o1-pro requires vastly more compute than o1 for each query, and OpenAI was providing effectively unlimited access to o1-pro to Pro users, with users who want tons of queries gravitating to that subscription. The combination of those factors is certainly why Sam Altman claimed they weren't making money on Pro users.

lmarena has also become less and less useful over time for comparing frontier models as all frontier models are able to saturate the performance needed for the kind of casual questions typically asked there. For the harder questions, o1 (not even o1-pro) still appears to be tied for 1st place with several other models... which is yet another indication of just how saturated that benchmark is.


“The impression overall I got here is that this is somewhere around o1-pro capability”.

“Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month)”.


The comment I was replying to had replied to an lmarena benchmark link. Perhaps you think that person should have replied to someone else? And, if you want to finish the quote, Karpathy's opinion on this is subjective. He admits it isn't a "real" evaluation.

"[...] though of course we need actual, real evaluations to look at."

His own tests are better than nothing, but hardly definitive.


I understood numpad0 to continue the comparison to o1-pro, after sigmoid10 expressed the opinion that the comparison is warranted.


Yes, numpad0 did... but I was pointing out that this choice was illogical. The lmarena results they were replying to only supported a comparison against o1, since o1 effectively matches Grok 3 on the benchmark being replied to (with o1-pro nowhere to be found), and then they immediately leapt into a bunch of weird value-proposition math. As I said, perhaps you think they should have replied to someone else? Replying to an lmarena benchmark indicates that numpad0 was using that benchmark as part of the justification of their math. I also pointed out the limitations of lmarena as a benchmark for frontier models.

I don't think anyone is arguing that ChatGPT Pro is a good value unless you absolutely need to bypass the rate limits all the time, and I cannot find a single indication that Premium+ has unlimited access to Grok 3. If Premium+ doesn't have unlimited rate limits, then it's definitely not comparable to ChatGPT Pro, and other than one subjective comment by Karpathy, we have no benchmarks that indicate that Grok 3 might be as good as o1-pro. You already get 99% of the value with just ChatGPT Plus compared to ChatGPT Pro for half the price of Premium+.

numpad0 was effectively making a strawman argument by ignoring ChatGPT Plus here... it is very easy for anyone to beat up a strawman, so I am here to point out a bad argument when I see one.


You're the one that came in and told him about the "factor in your math". Like you said, it's his comparison, not yours. If you want to do your own comparison, feel free. But don't come in and tell him he's not allowed to do his comparison. I for one like is comparison.


Guys, yall forget GIGO. First principles.

This thing is produced by musk.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: