More

rekwah · 2025-12-11T19:18:33 1765480713

Now do this with DuckDB.

sundbry · 2025-12-11T20:27:15 1765484835

Use iceberg tables for that in duckdb

rekwah · 2025-09-15T16:22:08 1757953328

Asc vs Desc sort order doing a lot of lifting here.

rekwah · 2025-05-09T06:50:48 1746773448

I'm fine with forks but hard to see @jeffail's marketing/persona of Benthos forked too. It was quirky, to say the least. Difficult line to walk, forking code brings repo docs but it feels insincere reading not on the original.

jeffail · 2025-05-09T10:40:42 1746787242

Personally, I'm happy to see them evolving it with my juices left intact. When we transitioned benthos.dev to https://docs.redpanda.com/redpanda-connect we gained a much larger (and more competent) docs team, but as a consequence had to trim a lot of the old personality. I don't regret the transition but I do take a bit of comfort seeing the WarpStream peeps continue to have fun with it.

It's like shedding an almost intact full body skin and finding comfort in watching colleagues take it bowling, having a great time.

ordinarily · 2025-05-09T14:11:29 1746799889

Was hoping you'd appreciate our efforts to retain your original quirky vision. We named the rice ball Geoff as a homage to you (intentionally spelled the silly way). (https://warpstreamlabs.github.io/bento/docs/about)/

ralferoo · 2025-05-10T12:02:12 1746878532

FWIW in the UK, Geoff is the usual spelling (from Geoffrey) of the name. Jeff (from Jeffrey) also exists in the UK, but is much rarer, even if it's the most common form in the US.

rekwah · on Dec 19, 2024

Congrats on the launch! I can definitely understand the pain point; frequent plan/pricing iteration in the early days always leaves a pile of "grandfathered entitlements" that get carried around.

Two questions.

1) Entitlements seem to permeate systems in few ways (pricing pages & billing systems as you've called out) but also into feature flags systems like LaunchDarkly (my plan offers access to beta feature channel) and authorization systems (RBAC, FGA, etc). Do you see replacing those systems with your SDK or Planship is more of an integrator that helps keep them synchronized?

2) I couldn't tell from glancing through your SDK docs (might have missed it) but do you provide any audit/temporal history? If I store user events in a data warehouse (timestamp, customer_id, action_performed), can I determine a customers plan from that historical timestamp or only their current plan?

LeFever · on Dec 19, 2024

Thanks! Great questions.

1. Today, we coexist with entitlement solutions (feature flags, auth systems, etc) by either working alongside them or feeding into entitlement aggregators. Basically, we handle the pricing-related data, logic, and aggregation that eventually reduces down to flags (Or numeric values, lists of items, or other value types). We offer SDKs to make these pricing entitlements easily accessible within a product marketing page, app, etc., but our API can just as easily be used to integrate with other systems.

2. We store complete subscription renewal history but the API for accessing it isn’t public yet. Audit trails, both for customer behavior like you mentioned, as well as admin tasks (E.g. Entitlement value X was added to plan Y) will be available via our API and in our console.

rekwah · on Dec 6, 2024

I started looking into this but DeleteObject doesn't support these conditional headers on general purpose buckets; only directory buckets (Express Zone One).

rekwah · on Feb 27, 2024

> just put it there, it might be useful later

> Also note that we have never mentioned anything about cardinality. Because it doesn’t matter - any field can be of any cardinality. Scuba works with raw events and doesn’t pre-aggregate anything, and so cardinality is not an issue.

This is how we end up with very large, very expensive data swamps.

_visgean · on Feb 27, 2024

that depends on the sampling rate no? I would much rather have a rich log record sampled at 1% than more records that dont contain enough info to debug..

kiitos · on Feb 28, 2024

It is a tragedy of the current generation of observability systems that they have inculcated the notion that telemetry data should be sampled. Absolute nonsense.

growse · on Feb 27, 2024

The people feeling the pain of (and paying for) the expensive data swamp are often not the same people who are yolo'ing the sample rate to 100% in their apps, because why wouldn't you want to store every event?

Put another way, you're in charge of a large telemetry event sink. How do you incentivise the correct sampling behaviour by your users?

Spivak · on Feb 27, 2024

Don't let the user pick the sampling rate. In Honeycomb land this is called the EMA Dynamic Sampler.

https://docs.honeycomb.io/manage-data-volume/refinery/sampli...

kiitos · on Feb 28, 2024

You should never need to sample telemetry data.

gtirloni · on Feb 27, 2024

Metrics sample rate yes but logging sample? When an end-to-end transaction for a very important task breaks, do I get *some* breadcrumbs to debug it?

_visgean · on March 1, 2024

I have used that approach before with sentry. It was a non-issue. It depends on nature of the project of course, we had a system that was running every second so if it failed it generated a lot of data..

goosejuice · on Feb 28, 2024

I agree. Sampling logs.. sounds dangerous. Obviously every system is different.

At least in GCP you can apply a filter to prevent ingestion and set different expiries on log budgets. This can help control costs without missing important entries.

isburmistrov · on Feb 28, 2024

Sampling can be smart, e.g. based on some field all events have (can be called traceId, haha).

rekwah · on Nov 23, 2023

Postgres wire format is indirectly getting there. Plenty of tools use that with wildly different storage engines on the other end.

A clean room implementation would likely yield different results but there appears to be some appetite for a solution.

rekwah · on May 16, 2023

Don't leave us hanging. Did you get a discount?! ;)

rekwah · on April 24, 2023

"1Password Unlocks $620M Round, Reaches $6.8B Valuation" would be my guess.

rekwah · on Nov 10, 2022

Curious if cuelang just ended up being too much of a hurdle for onboarding. I like it and have used it quite a bit but there's something about the syntax that makes it impenetrable for many.

shykes · on Nov 10, 2022

There's some of that. CUE is incredibly powerful, but it can be polarizing. But the fundamental problem is that developers don't want to learn a new language to write CI/CD pipelines: they want to use the language they already know and love.

So, no matter what language we had chosen for our first SDK, we would have eventually hit the same problem. The only way to truly solve the "CI/CD as code" problem for everyone, is to have a common engine and API that can be programmed with (almost) any language.

aliasxneo · on Nov 11, 2022

In my case, I just simply didn't like it (CUE). I'm much more optimistic about Nickel at this point.