More

gdubya · 2025-11-12T16:42:18 1762965738

Exactly. https://www.reddit.com/r/kubernetes/comments/u9b95u/kubernet...

gdubya · 2025-10-12T07:50:07 1760255407

Public sector? In my country working for the energy grid operator feels like a constructive and positive contribution.

Sorry about your dark place. Keep looking for the light!

FirmwareBurner · 2025-10-12T08:51:43 1760259103

If everyone works for the public sector, who's gonna work in the "evil private sector" to make the tax money that funds the public sector?

kruffalon · 2025-10-12T09:20:51 1760260851

Is that really true?

Isn't it the other way around: the public sector funds the private sector that allows people to amass their hoards?

FirmwareBurner · 2025-10-13T14:59:18 1760367558

gdubya · 2025-09-03T11:55:41 1756900541

Very interesting! Who made this?

dannyfraser · 2025-09-03T13:09:10 1756904950

A bit of digging in the T&Cs, Companies House, and LinkedIn pointed me to an individual working for the NHS who has put this together as a side project.

I work in this space (https://www.woodmac.com/), mostly with natural gas data but have worked on power in the past so I'm always interested to see if it's anyone I know (in this case it isn't).

Building something like this isn't really that difficult - all of the data is publicly accessible and if you can transform it and pull it into a database and build a front-end app then you're pretty much there. The developer has stated that the main source for this is https://bmrs.elexon.co.uk/, but other good sources of energy data (across Europe) are https://transparency.entsoe.eu/ for power and https://transparency.entsog.eu/ for gas. Also useful are https://alsi.gie.eu/ for LNG imports and https://agsi.gie.eu/ for gas storage.

zeristor · 2025-09-03T12:39:55 1756903195

I don’t know.

I had noticed the Norway interconnect was running at 0 MW, and I was trying different sites to see if it was the data feed.

It wasn’t, it seems Norway NO2 area has 50% water levels in its hydro dams, the rest seem OK but NO2 region is the one which exports power to UK, Germany, and Denmark.

The little animations of power moving along lines are very cool.

gdubya · 2025-07-14T19:13:53 1752520433

The technique can be applied by any engine, not just DataFusion. Each engine would have to know about the indexes in order to make use of them, but the fallback to parquet standard defaults means that the data is still readable by all.

aerzen · 2025-07-14T19:23:16 1752520996

But does data fusion publish a specification of how this metadata can be read, along with a test suite for verifying implementations? Because if they don't, this cannot be reliably used by any other impl

jasim · 2025-07-14T19:42:47 1752522167

Parquet files include a field called key_value_metadata in the FileMetadata structure; it sits in the footer of the file. See: https://github.com/apache/parquet-format/blob/master/src/mai...

The technique described in the article, seems to use this key-value pair to store pointers to the additional metadata (in this case a distinct index) embedded in the file. Note that we can embed arbitrary binary data in the Parquet file between each data page. This is perfectly valid since all Parquet readers rely on the exact offsets to the data pages specified in the footer.

This means that DataFusion does not need to specify how the metadata is interpreted. It is already well specified as part of the Parquet file format itself. DataFusion is an independent project -- it is a query execution engine for OLAP / columnar data, which can take in SQL statements, build query plan, optimize them, and execute. It is an embeddable runtime with numerous ways to extend it by the host program. Parquet is a file format supported by DataFusion because it is one of the most popular ways of storing data in a columnar way in object storages like S3.

Note that the readers of Parquet need to be aware of any metadata to exploit it. But if not, nothing changes - as long as we're embedding only supplementary information like indices or bloom filters, a reader can still continue working with the columnar data in Parquet as it used to; it is just that it won't be able to take advantage of the additional metadata.

alamb · 2025-07-15T11:01:36 1752577296

> Note that the readers of Parquet need to be aware of any metadata to exploit it. But if not, nothing changes

The one downside of this approach, which is likely obvious, but I haven't seen mentioned is that the resulting parquet files are larger than they would be otherwise, and the increased size only benefits engines that know how to interpret the new index

(I am an author)

SiempreViernes · 2025-07-14T20:22:50 1752524570

So, can we take that as a "no"?

gazpacho · 2025-07-15T02:11:49 1752545509

There is no spec. Personally I hope that the existing indexes (bloom filters, zone maps) get re-designed to fit into a paradigm where parquet itself has more first class support for multiple levels of indexes embedded in the file and conventions for how those common types. That is, start with Wild West and define specs as needed

alamb · 2025-07-15T10:57:29 1752577049

> That is, start with Wild West and define specs as needed

Yes this is my personal hope as well -- if there are new index types that are widespread, they can be incorporated formally into the spec

However, changing the spec is a non trivial process and requires significant consensus and engineering

Thus the methods used in the blog can be used to use indexes prior to any spec change and potentially as a way to prototype / prove out new potential indexes

(note I am an author)

DAlperin · 2025-07-14T21:16:45 1752527805

The story here isn't that they've invented a new format for user defined indexes (the one proposed here is sort of contrived and I probably wouldn't recommend in production) but rather demonstrating how the user defined metadata space of the parquet format can be used for application specific purposes.

I work on a database engine that uses parquet as our on-storage file format and we make liberal use of the custom metadata area for things specific to our product that any other parquet readers would just ignore.

gdubya · 2025-06-27T13:51:48 1751032308

Watch out for fulings on the plains!

Henchman21 · 2025-06-27T14:48:12 1751035692

And deathsquitos!

gdubya · 2025-06-06T14:18:58 1749219538

Fixed link: https://www.synacktiv.com/en/publications/github-actions-exp...

woodruffw · 2025-06-06T14:21:56 1749219716

Thanks, I've fixed my comment as well.

gdubya · 2025-02-05T17:37:44 1738777064

That transcription reads very much like a NotebookLM "podcast" summarising the actual article at https://www.scientificamerican.com/article/outrage-fatigue-i...

gdubya · 2025-01-18T19:08:29 1737227309

Hilbert curves are used in modern data lakehouse storage optimisation techniques, such as Databricks' "liquid clustering" [1]. This can replace the need for more traditional "hive-style" partitioning, in which the data files are partitioned based on a folder structure (e.g. `mydatafiles/YYYY/MM/DD/`).

1. https://docs.databricks.com/en/delta/clustering.html

gdubya · on Nov 3, 2023

I wanted to build a Windows container image but I really did not want to install Docker Desktop. After some digging around I found my way to the Docker server / client binaries for Windows page that allowed me to do this: https://docs.docker.com/engine/install/binaries/#install-ser...

bafe · on Nov 4, 2023

Thanks, that's good to keep in mind if I ever need to run docker engine on windows!

gdubya · on June 27, 2023

Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game!