More

robertkoss · 2025-12-11T14:16:51 1765462611

I will never understand how people honestly think that there is a such a thing as a central DB. Do you really think that Gov Agencies from all over the world deploy Gotham just connected to the internet without controlling inflow / outflow of data? I would bet money that 99% of critical systems are not even connected to the internet but air-gapped because, believe it or not, people at those agencies are not that stupid.

robertkoss · 2025-12-10T16:04:22 1765382662

I think Foundry is insanely impressive tbh. If you set it up correctly, its insanely powerful

lolive · 2025-12-10T19:24:31 1765394671

I second that. My company is really changing its point of view on data at scale thanks to their tools. [note: SAP announces DataSphere for 2026, and their stack is surprisingly similar :)]

robertkoss · 2025-12-11T14:13:47 1765462427

Yeah, but Foundry is so ahead, not seeing DataSphere competing there honestly. The only reason is, you already are on SAP and don't want a second system.

Also the engineering / product culture @Palantir is diametrically opposed to what exists at SAP, so I favour Palantir.

robertkoss · 2025-12-02T17:23:17 1764696197

Chat, what do I see here?

robertkoss · 2025-12-02T16:51:08 1764694268

The problem that this article has is essentially this:

> Thiel by contrast is profiting from the use of AI weapons targeting systems used in the Ukraine war and the genocide in Gaza.

Thiel is IMO not doing this for profit. He is deeply ideological, which should be more worrisome.

shevy-java · 2025-12-02T16:58:49 1764694729

He 100% uses this for profit. Why does he not give away his money?

It is profit. The craziness is just the cover for it.

philipallstar · 2025-12-02T17:03:58 1764695038

> He 100% uses this for profit. Why does he not give away his money?

You haven't given away your money. Does that mean everything you do (including writing the above comment) is for profit?

galleywest200 · 2025-12-02T17:10:33 1764695433

Sure, but I don't profit from wars.

BugsJustFindMe · 2025-12-02T17:30:12 1764696612

Everything else aside, this is a really stupid comparison. A multi-billionaire giving away even 99% of their billions of dollars is not the same as any normal person giving away a substantially smaller percentage of their money. Thiel can give away 99% of his money and would still have hundreds of millions of dollars left, which is already quite literally more money than any normal person can spend in their entire lifetime. Even at only 3% interest, he would get more interest money alone per year than the income of almost anyone on the planet.

philipallstar · 2025-12-03T07:45:44 1764747944

> which is already quite literally more money than any normal person can spend in their entire lifetime

They don't spend it - if you think someone with a net worth of X has that money in cash then you need to go to school.

Even if they did - that's a greedy person's way of thinking. They invest money they do have, which employs people and pushes world R&D forwards.

michaelmrose · 2025-12-02T17:01:35 1764694895

What AI weapons systems are in use in Ukraine?

NoOn3 · 2025-12-02T21:25:34 1764710734

I don't really know, but maybe drones.

robertkoss · 2025-11-26T14:37:51 1764167871

That was a great read!

wjsdj2009 · 2025-11-26T14:46:51 1764168411

Thanks a lot! Glad you enjoyed it!

robertkoss · 2025-11-13T15:01:09 1763046069

Germany here. Seems to be global then.

robertkoss · 2025-09-24T09:24:31 1758705871

yup, it's an ad in disguise.

robertkoss · 2025-09-04T11:01:53 1756983713

That is a false dichotomy. You can use SQL tools but still have to choose the instance type.

Especially when considering testability and composability, using a DataFrame API inside regular languages like Python is far superior IMO.

gigatexal · 2025-09-04T12:47:08 1756990028

Yeah it makes no sense.

Why is the dataframe approach getting hate when you’re talking about runtime details?

That folks understand the almost conversational aspect of SQL vs. that of the dataframe api but the other points make no difference.

If you’re a competent dev/data person and are productive with the dataframe then yay. Also setup and creating test data and such it’s all objects and functions after all — if anything it’s better than the horribad experience of ORMs.

drej · 2025-09-04T11:16:21 1756984581

As a user? No, I don't have to choose. What I'm saying is that analysts (who this Polars Cloud targets, just like Coiled or Databricks) shouldn't worry about instance types, shuffling performance, join strategies, JVM versions, cross-AZ pricing etc. In most cases, they should just get a connection string and/or a web UI to run their queries, everything abstracted from them.

Sure, Python code is more testable and composable (and I do love that). Have I seen _any_ analysts write tests or compose their queries? I'm not saying these people don't exist, but I have yet to bump into any.

robertkoss · 2025-09-04T11:27:49 1756985269

You were talking about data engineering. If you do not write tests as a data engineer what are you doing then? Just hoping that you don't fuck up editing a 1000 > line SQL script?

If you use Athena you still have to worry about shuffling and joining, it is just hidden.. It is Trino / Presto under the hood and if you click explain you can see the execution plan, which is essentially the same as looking into the SparkUI.

Who cares about JVM versions nowadays? No one is hosting Spark themselves.

Literally every tool now supports DataFrame AND SQL APIs and to me there is no reason to pick up SQL if you are familiar with a little bit of Python

datadrivenangel · 2025-09-04T14:36:57 1756996617

Way too many data engineers are running in clown mode just eyeballing the results of 1000 line SQL scripts....

https://ludic.mataroa.blog/blog/get-me-out-of-data-hell/

drej · 2025-09-04T11:54:10 1756986850

I was talking about data engineering, because that was my job and all analysts were downstream of me. And I could see them struggle with handling infrastructure and way too many toggles that our platform provided them (Databricks at the time).

Yes, I did write tests and no, I did not write 1000-line SQL (or any SQL for that matter). But I could see analysts struggle and I could see other people in other orgs just firing off simple SQL queries that did the same as non-portable Python mess that we had to keep alive. (Not to mention the far superior performance of database queries.)

But I knew how this all came to be - a manager wanted to pad their resume with some big data acronyms and as a result, we spent way too much time and money migrating to an architecture, that made everyone worse off.

ritchie46 · 2025-09-04T11:46:43 1756986403

With Polars Cloud you don't have to choose those either. You can pick cpu/memory and we will offer autoscaling in a few months.

Cluster configuration is optional if you want this control. Anyhow, this doesn't have much to do with the query API, be it SQL or DataFrame.

ayhanfuat · 2025-09-04T11:22:55 1756984975

I really doubt that Polars Cloud targets analysts doing ad-hoc analyses. It is much more likely towards people who build data pipelines for downstream tasks (ML etc).

ritchie46 · 2025-09-04T11:40:57 1756986057

We also target ad-hoc analysis. If your data doesn't fit on your laptop, you can spin up a larger box or a cluster and run interactive queries.

riku_iki · 2025-09-04T21:47:12 1757022432

> analysts (who this Polars Cloud targets, just like Coiled or Databricks) shouldn't worry about instance types, shuffling performance, join strategies,

I think this part(query optimizations) in general not solved/solvable, and it is sometimes/often(depending on domain) necessary to digg into details to make data transformation working.

mr_toad · 2025-09-04T12:14:53 1756988093

Analysts don’t because it’s not part of the training & culture. If you’re writing tests you’re doing engineering.

That said the last Python code I wrote as a data engineer was to run tests on an SQL database, because the equivalent in SQL would have been tens of thousands of lines of wallpaper code.

gigatexal · 2025-09-04T12:47:41 1756990061

Again the issue you’re having is the skill level of the audience you keep bringing up not the tool.

drej · 2025-09-04T12:58:54 1756990734

I find it much more beneficial to lower the barrier for entry (oftentimes without any sacrifices) instead of spending time and money on upskilling everyone, just because I like engineering.

gigatexal · 2025-09-04T14:07:23 1756994843

Right but nobody is saying polars or data frames is to replace SQL or is even for the masses. It’s a tool for skilled folks. I personally think the api makes sense but SQL is easier to pick up. Use whatever tools work best.

But coming into such a discussion dunking on a tool cuz it’s not for the masses makes no sense.

drej · 2025-09-04T14:30:36 1756996236

Read my posts again, I'm not complaining it's not for the masses, I know it isn't. I'm complaining that it's being forced upon people when there are simpler alternatives that help people focus on business problems rather than setting up virtual environments.

So I'm very much advocating for people to "[u]se whatever tools work best".

(That is - now I'm doing this. In the past I taught a course on pandas data analytics and spoke at a few PyData conferences and meetups, partly about dataframes and how useful they are. So I'm very much guilty of what all of the above.)

gigatexal · 2025-09-04T15:02:34 1756998154

Who is doing the forcing? I’ve not found a place in my decade as a data engineer that such places forced dataframes on would be and capable SQL analysts.

robertkoss · 2025-09-04T09:57:53 1756979873

OG polars announcement: https://news.ycombinator.com/item?id=23768227

dvko · 2025-09-04T10:11:52 1756980712

Never forget! Crazy to see how far it's come. And how lackluster the initial reception on HN was back then.

robertkoss · 2025-09-04T09:57:22 1756979842

Love it!

Still don't get why one of the biggest player in the space, Databricks is overinvesting in Spark. For startups, Polars or DuckDB are completely sufficient. Other companies like Palantir already support bring your own compute.

whyever · 2025-09-04T10:35:09 1756982109

That's a good question! Especially after Frank McSherry's COST paper [1], it's hard to imagine where the sweet spot for Spark is. I guess for Databricks it makes sense to push Spark, since they are the ones who created it. In a way, it's their competitive advantage.

[1]: https://www.usenix.org/system/files/conference/hotos15/hotos...

mr_toad · 2025-09-04T13:18:34 1756991914

Databricks is targeting large enterprises, who have a variety of users. Having both Python and SQL as first class languages is a selling point.