More

benesch · 2025-05-11T04:02:24 1746936144

It’s hard to overstate the amount of service Ian provided to the Go community, and the programming community at large. In addition to gccgo, Ian wrote the gold linker, has blogged prolifically about compiler toolchains, and maintains huge swaths of the gcc codebase [0]. And probably much, much more that I’m not aware of.

I’ve had the pleasure of trading emails with Ian several times over the years. He’s been a real inspiration to me. Amidst whatever his responsibilities and priorities were at Google he always found time to respond to my emails and review my patches, and always with insightful feedback.

I have complicated feelings about the language that is Go, but I feel confident in saying the language will be worse off without Ian involved. The original Go team had strong Bell Labs vibes—a few folks who understood computers inside and out who did it all: as assembler, linker, two compilers, a language spec, a documentation generator, a build system, and a vast standard library. It has blander, corporate vibes now, as the language has become increasingly important to Google, and standard practices for scaling software projects have kicked in. Such is the natural course of things, I suppose. I suspect this cultural shift is what Ian alluded to in his message, though I am curious about the specific tipping point that led to his decision to leave.

Ian, I hope you take a well-deserved break, and I look forward to following whatever projects you pursue next.

[0]: https://github.com/gcc-mirror/gcc/blob/master/MAINTAINERS

sidkshatriya · 2025-05-11T04:47:09 1746938829

It's very important for both the compiler tools chains of go to continue working well for redundancy and feature design validation purposes. However, I'm generally curious -- do people / organizations use gcc-go for some use cases ?

fweimer · 2025-05-11T18:01:46 1746986506

GCC Go does not support generics, so it's currently not very useful.

pjmlp · 2025-05-12T09:25:10 1747041910

I assume it will follow gjc footsteps if no one steps up for maintenance.

GCC has a high bar for having frontends added into the standar distribution, and if there isn't a viable reason why they should be kept around, they get eventually removed.

What kept gcj around for so many years, after being almost left for dead, was that it was the only frontend project that had unit tests for specific compilation scenarios.

Eventually someone took the effort to migrate those tests, and remove gcj.

flakes · 2025-05-15T02:50:12 1747277412

It has it's niche uses, such as compiling Go for lesser used architectures. It's a bit awkward to not have full language capabilities, but it still feels nicer than writing C/C++.

necheffa · 2025-05-12T00:27:58 1747009678

> GCC Go does not support generics, so it's currently not very useful.

I don't think a single one of the Go programs I use (or have written) use generics. If generics is the only sticking point, then that doesn't seem to be much of a problem at all.

pbh101 · 2025-05-12T01:04:10 1747011850

You’re also at the mercy of the libraries you use, no? Which likely makes this an increasingly niche case?

necheffa · 2025-05-13T00:52:01 1747097521

> You’re also at the mercy of the libraries you use, no?

To a certain extent. No one says you must use the, presumably newer, version of a library using generics or even use libraries at all. Although for any non-trivial program this is probably not how things are going to shake out for you.

> Which likely makes this an increasingly niche case?

This assumes that dependencies in general will on average converge on using generics. If your assertion is that this is the case, I'm going to have to object on the basis that there are a great many libraries out there today that were feature-complete before generics existed and therefore are effectively only receiving bug fix updates, no retrofit of generics in sight. And there is no rule that dictates all new libraries being written _must_ use generics.

xena · 2025-05-12T00:33:14 1747009994

I just used them today to sort a list of browser releases by their publication date. They're not universal hammers but sometimes you do encounter something nail shaped that they're great at.

benesch · 2025-01-26T12:18:41 1737893921

Yes, the three major open table formats are all quite similar.

When AWS launched S3 Tables last month I wrote a blog post with my first impressions: https://meltware.com/2024/12/04/s3-tables

There may be more in depth comparisons available by now but it’s at least a good starting point for understanding how S3 Tables integrates with Iceberg.

jl6 · 2025-01-26T13:36:37 1737898597

Cool, thank you. It feels like Athena + S3 Tables has the potential to be a very attractive serverless data lakehouse combo.

benesch · on Dec 3, 2024

> I'm sure they'd quickly argue it's wire compatibility, but even then it's a slippery slope and wire compatible is left open to however the person wants to interpret it.

I actually think that they'd argue they intend to close the feature gap for full Postgres semantics over time. Indeed their marketing was a bit wishful, but on Bluesky, Marc Brooker (one of the developers on the project) said they reused the parser, planner, and optimizer from Postgres: https://bsky.app/profile/marcbrooker.bsky.social/post/3lcghj...

That means they actually have a very good shot at approaching reasonably full Postgres compatibility (at a SQL semantics level, not just at the wire protocol level) over time.

benesch · on Dec 3, 2024

> I liked the author's write-up, but as an old programmer take umbrage at the idea that changing your parser in the middle of a program is "crazy", we used to do this... well maybe not all the time... but with a greater frequency than we do today.

I think Justin addresses that point, though! He writes:

> The development of programming languages over the past few decades has been, at least in part, a debate on how best to allow users to express ways of building new functionality out of the semantics that the language provides: functions, generics, modules.

And indeed by modern PL standards patching the parser at runtime is very unusual.

The "modern" language that I've worked in that comes closest is Ruby, since the combination of monkey patching and the lack of symbols in the function call syntax is well suited to constructing DSLs. But most teams I've worked with that use Ruby eventually developed a strict "no monkey patching" rule, based on lived experience. At scale allowing developers to invent DSLs on the fly via monkey patching made the programs as a whole too complicated to reason about—too hard to move between modules in the codebase if every module essentially had its own syntax that needed to be learned.

I suppose describing this as "dark, demonic pathways" is a bit overstated for comedic effect but indeed "change the language syntax at runtime" does seem to be generally accepted these days as a bad software engineering practice. Works fine at a small scale, but doesn't age well as a team and codebase grows.

benesch · on Nov 26, 2024

Yes! I’m actively working on it, in fact. We’re waiting on the next release of the Rust `object_store` crate, which will bring support for S3’s native conditional puts.

If you want to follow along: https://github.com/slatedb/slatedb/issues/164

benesch · on Oct 21, 2024

> Anecdotally I have had to do this in js a few times. I have never had to do this in Rust. Probably because Rust projects are likely to ship with fewer bugs.

Still anecdotal, but I have worked on a large Rust codebase (Materialize) for six years, worked professionally in JavaScript before that, and I definitely wouldn’t say that Rust projects have fewer bugs than JavaScript projects. Rust projects have plenty of bugs. Just not memory safety bugs—but then you don’t have those in JavaScript either. And with the advent of TypeScript, many JS projects now have all the correctness benefits of using a language with a powerful type system.

We’ve forked dozens of Rust libraries over the years to fix bugs and add missing features. And I’m know individual Materialize developers have had to patch log lines into our dependencies while debugging locally many a time—no record of that makes it into the commit log, though.

wormlord · on Oct 21, 2024

It could be that I just haven't written enough Rust to encounter this issue. Thanks for the insight!

benesch · on May 14, 2024

> It would be so much better if this were a Postgres extension instead.

I've thought about this counterfactual a lot. (I'm a big part of the reason that Materialize was not built as a PostgreSQL extension.) There are two major technical reasons that we decided to build Materialize as a standalone product:

1. Determinism. For IVM to be correct, computations must be strictly deterministic. PostgreSQL is full of nondeterministic functions: things like random(), get_random_uuid(), pg_cancel_backend(), etc. You can see the whole list with `SELECT * FROM pg_proc WHERE provolatile <> 'i'`. And that's just scratching the surface. Query execution makes a number of arbitrary decisions (e.g., ordering or not) that can cause nondeterminism in results. Building an IVM extension within PostgreSQL would require hunting down every one of these nondeterministic moments and forcing determinism on them—a very long game of whack a mole.

2. Scale. PostgreSQL is fundamentally a single node system. But much of the reason you need to reach for Materialize is because your computation is exceeding the limit of what a single machine can handle. If Materialize were a PostgreSQL extension, IVM would be competing for resources (CPU, memory, disk, network) with the main OLTP engine. But since Materialize is a standalone system, you get to offload all that expensive IVM work to a dedicated cluster of machines, leaving your main PostgreSQL server free to spend all of its cycles on what it's uniquely good at: transaction concurrency control.

So while the decision to build Materialize as a separate system means there's a bit more friction to getting started, it also means that you don't need to have a plan for what happens when you exceed the limits of a single machine. You just scale up your Materialize cluster to distribute your workload across multiple machines.

One cool thing we're investigating is exposing Materialize via a PostgreSQL foreign data wrapper [0]. Your ops/data teams would still be managing two separate systems, but downstream consumers could be entirely oblivious to the existence of Materialize—they'd just query tables/views in PostgreSQL like normal, and some of those would be transparently served by Materialize under the hood.

[0]: https://www.postgresql.org/docs/current/postgres-fdw.html

benesch · on May 14, 2024

(Materialize CTO here.)

> It's becoming more mainstream with Materialize, which is technically open-source, but they are quite aggressive with pushing their expensive cloud and offuscating on-prem usage.

Quick but important clarification: Materialize is source available, not open source. We've been licensed under the BSL [0] from the beginning. We feel that the BSL is the best way to ensure we can build a sustainable business to fund Materialize's development, while still contributing our research advances back to the scientific community.

> Quite underrated, it has so much promise.

I'm glad you think so. We think so too. One of the best parts of my job is watching the "aha" moment our prospects have when they realize how much of the complex code they've been writing is neatly expressed as a SUBSCRIBE over a SQL materialized view.

[0]: https://github.com/MaterializeInc/materialize/blob/main/LICE...

steeeeeve · on May 14, 2024

It's crazy to me that the most updated file in your repository is the license - pushing back the open source date by a day every day.

benesch · on May 14, 2024

Those updates are not retroactive. They apply on a go forward basis. Each day's changes become Apache 2.0 licensed on that day four years in the future.

For example, v0.28 was released on October 18, 2022, and becomes Apache 2.0 licensed four years after that date (i.e., 2.5 years from today), on October 18, 2026.

[0]: https://github.com/MaterializeInc/materialize/blob/76cb6647d...

BeefySwain · on May 14, 2024

I love this concept. Did you all come up with this or is there prior art? Is there a name for this concept?

benesch · on May 14, 2024

We did not originate the Business Source License (BSL/BUSL). It was originally developed by the folks behind MariaDB. Wikipedia has a good article that covers the history: https://en.wikipedia.org/wiki/Business_Source_License

Other large projects using the BSL include CockroachDB and (somewhat infamously) Terraform.

We're very glad to have been using the BSL for Materialize since our very first release. Relicensing an existing open source project under the BSL can be a painful transition.

BeefySwain · on May 14, 2024

I was actually asking about the automatic timed re-license to Apache :)

benesch · on May 15, 2024

Ah, I misunderstood! Yes, we may have invented that. I whipped up the cron job a few years back in response to concerns from our legal team. I’m not aware of any prior art for automatically advancing the change date for the BSL.

jjovan1 · on May 14, 2024

Hey Benesch, is Materialize used by TimescaleDB to create Materialized View? I noticed a similar approach.

benesch · on May 14, 2024

Not to my knowledge. I believe TimescaleDB has their own incremental view maintenance engine.

jjovan1 · on May 14, 2024

Ok so I was wondering if your solution is faster. I noticed their materialized views are not as fast for real time data.

benesch · on May 14, 2024

We haven't benchmarked TimescaleDB, so I can't say. Results tend to vary heavily by workload, too.

What I can say is that the research at the heart of Materialize (https://dl.acm.org/doi/10.1145/2517349.2522738) allows us to efficiently maintain computations that are more complex than what a lot of other IVM systems can handle.

Your best bet is to run your own benchmark of both systems using data that's representative of your workload. We offer a free seven day playground if you'd like to run such a benchmark: https://console.materialize.com/account/sign-up

We also have a community Slack where a number of Materialize employees hang out and answer questions: http://materialize.com/s/chat

jjovan1 · on May 18, 2024

Thanks!

benesch · on Feb 21, 2024

(Materialize CTO here.)

Partial materialization is indeed Noria's major contribution to dataflow technology, and it's impressive stuff. But I want to call out that there are a number of techniques that folks use with Materialize to avoid paying for O(entire materialized view). The two most common techniques are demand-driven queries using lateral joins [0] and temporal filters [1]. Noria's approach to partial materialization is automatic but gives the user less explicit control; Materialize's approach is manual, but gives the user more explicit control.

The other major divergence between Materialize and Noria is around consistency. Noria is eventually consistent, while Materialize is strongly consistent. There is a caveat to Materialize's consistency guarantees today: we don't offer strong consistency across your upstream {Kafka, PostgreSQL, MySQL} and Materialize. You only get strong consistency within Materialize itself. But we've got an improvement for that in the works that'll be rolling out in the next few months.

[0]: https://materialize.com/blog/lateral-joins-and-demand-driven...

[1]: https://materialize.com/docs/transform-data/patterns/tempora...

jitl · on Feb 22, 2024

Thanks for the great reply, I didn’t know about lateral join.

benesch · on Jan 31, 2024

> Materialize no longer provide the latest code as an open-source software that you can download and try. It turned from a single binary design to cloud-only micro-service

Materialize CTO here. Just wanted to clarify that Materialize has always been source available, not OSS. Since our initial release in 2020, we've been licensed under the Business Source License (BSL), like MariaDB and CockroachDB. Under the BSL, each release does eventually transition to Apache 2.0, four years after its initial release.

Our core codebase is absolutely still publicly available on GitHub [0], and our developer guide for building and running Materialize on your own machine is still public [1].

It is true that we substantially rearchitected Materialize in 2022 to be more "cloud-native". Our new cloud offering offers horizontal scalability and fault tolerance—our two most requested features in the single-binary days. I wouldn't call the new architecture a microservices design though! There are only 2-3 services, each quite substantial, in the new architecture (loosely: a compute service, an orchestration service, and, soon, a load balancing service).

We do push folks to sign up for a free trial of our hosted cloud offering [2] these days, rather than trying to start off by running things locally, as we generally want folks' first impressions of Materialize to be of the version that we support for production use cases. A all-in-one single machine Docker image does still exist, if you know where to look, but it's very much use-at-your-own-risk, and we don't recommend using it for anything serious, but it's there to support e.g. academic work that wants to evaluate Materialize's capabilities to incrementally maintain recursive SQL queries.

If folks have questions about Materialize, we've got a lively community Slack [3] where you can connect directly with our product and engineering teams.

[0]: https://github.com/MaterializeInc/materialize/tree/main

[1]: https://github.com/MaterializeInc/materialize/blob/main/doc/...

[2]: https://materialize.com/playground/

[3]: https://materialize.com/s/chat