AFAIK these two joins are exactly the same once you get past the parsing. It's just a different way to write an inner join. It's translated into the same AST and so there's no difference in planning/execution.
Perhaps in this very basic case they are exactly the same but is that still true if you add secondary WHERE conditions that apply to just one table, or if one "table" is actually a view with a complex query definition, or many other ways in which a very simple "example" can quickly get complicated?
In general, you split up the WHERE condition at every top-level AND. Then you do either pullup or pushdown or both (it depends a bit on how your planner looks on the inside). In the end, you end up with the two cases being exactly the same before you start planning.
For outer joins (left/right/full), it's different, and there you absolutely need the explicit join syntax (for correctness). And semijoins are not expressed the same way at all, partially for weird historical reasons.
That is part of the key idea, yes. It's more elaborate, because it can split the aggregate - it can do part of it before the join, and finalize it after the join. Similarly to what we do for parallel queries.
As for indexes, it can help, but not in this particular example - the "code" tables are tiny, and the planner adds Memoize nodes anyway, so it acts like an ad hoc index.
Indexes are more of a complementary improvement, not an alternative to this optimization (i.e. neither makes the other unnecessary). FWIW in this case the indexes won't help very much - if you use more data in the code tables, it'll use a hash join, not nested loop / merge join.
That doesn't mean we couldn't do better with indexes, there probably are smart execution strategies for certain types of queries. But indexes also come with quite a bit of overhead (even in read-only workloads).
It's not about not knowing about an optimization. The challenge is to know when to apply it, so that it does not cause regressions for cases that can't benefit from it. It may be less risky in specialized systems, like BI systems typically don't need to worry about regressing OLTP workloads. Postgres absolutely needs to be careful of that.
I believe that's one of the reasons why it took about ~8 years (the original patch was proposed in 2017).
IMHO the whole point of Qubes is that it does not do the compartmentalization at the level of individual applications, but groups of applications. Otherwise you'd need to very clearly specify how/when exactly the applications can exchange data, what data, etc. I'm not saying it's impossible, but "apps in the same qube VM can do whatever" is a much easier concept.
I believe there are reasons why e.g. io_uring could be inherently slower in some cases, and I tried to point some of those out.
With io_uring everything happens in the backend process, and so consumes some of the CPU time that might otherwise be spent executing the query. All the checksum verification, memcpy into shared buffers, etc. happen in the backend. And those things can be quite expensive. With worker this happens in the other processes, spreading the overhead.
Of course, on truly I/O-bound workload (actually waiting on the I/O), this may not be a huge difference. For warmed-up cases it may be more significant.
You can have a io_uring worker per thread and send jobs into a queue for them to pick it up and execute, so it won’t max out on cpu. Each worker would do cooperative async scheduling internally between the tasks so nothing delays other things too much.
But this is more difficult than spawning thousands of os threads and running sync workers on them
Ah my mistake. I meant to write something like, you can have a thread per cpu core that is running an io_uring instance like you would have a bunch of workers in a pool in the "worker" implementation.
If there are no threads then this would be, you can have a process per cpu core
> Right now async IO is used for sequential scans and bitmap scans, not for index scans. My initial guess would be that it mostly helps for complex queries (that use multiple indexes, so bitmap scans) and unoptimized queries (sequential scans), not so much for straightforward and/or optimized queries that use a single index. But this is really just a guess, I'm way out of my depth at this point. I'm curious how much it'll help once it is implemented for index scans as well.
Those are good guesses, IMHO.
For sequential scans, some of the "async" work could be done by kernel read-ahead, but AIO makes it explicit and loads the data into shared buffers, not just page cache. For bitmap scans we already had prefetching by fadvise, which is somewhat similar to read-ahead (also into page cache), and there were some bugs that made it ineffective in some cases, and AIO fixes that.
For index scans the difference can be an order of magnitude (say, 5-10x). Doing random I/O block by block is simply awful, prefetching data is important. I was just doing some testing on TPC-H, and on scale 50 I see Q8 going from 130s to 20s, and Q19 from 50s to 8s. And smaller improvements for a couple more queries. Of course, it depends on what else the query is doing - if it's spending 1% on the index, you won't notice a difference.
I did a lot of tests comparing the io_method choices, and I'm yet to see a realistic query where it makes a significant difference of more than a couple percent (in either direction). I'm sure it's possible to construct such queries, and it's interesting, but for real workloads it's mostly not noticeable.
At least that's how I see it right now, we'll see how that works on a much wider range of hardware and systems. The github repo linked from the pgsql-hackers post has a lot more results, some charts include results for the index prefetching patch - and there it makes more difference in some cases. But the patch is still fairly rough, it could be a bug in it too, and it changed a lot since August.
The funny thing is the local communist newspaper "Red Truth" (as if there were non-communist ones, ...) published a review of LOTR in 1977, in which they pretty much took the side of the Mordor. (It might be a made-up joke from the 90s, but the spirit of absurdity is spot on for 1977.)
The reasoning was roughly:
* Mordor is obviously meant to be USSR, as it's in the east.
* The orcs are clearly heavy industry workers, building the world of future.
* Bilbo is obviously a son from a bourgeoisie family, disgusted by hard work.
* The west is represented by elves = aristocracy, people = bourgeoisie, hobbits = landowners.
* The group of reactionaries are afraid of a made up "threat from the east", led by Gandalf.
* Gandalf = a reactionary ideologue, keeping people in state of fear of progress and knowledge.
* Saruman = protector of the oppressed, declared a traitor and destroyed by the reactionaries.
* But socialism can't be destroyed by throwing something in the fire. All the power to Mordor, surrounded by reactionary neighbors.
And these days many Russian ultra-patriots are proudly accepting the representation of Russia as Mordor and its soldiers as orcs in the war in Ukraine.
The outlined reasons are cartoonishly communist, but "The Last Ringbearer"'s worldview is not communist. It is more like "the West are liars". And yes, this does resonate in Russia.
Well, they are not entirely wrong: Tolkien was a big fan of "old England", as he saw it, with its primarily agricultural focus, and held rather dim view on industrialization and modernism. While Soviets of course were fanatics of industrialization and considered the petty bourgeoisie and kulaks (and Bagginses certainly look a lot like kulaks) their mortal enemies. Tolkien was very adamant that he does not do allegories, and yet the Soviets were right to consider him ideologically opposed to them. He was also a devout Catholic and ardent anti-Communist, so whether or not the Red Truth really declared him an enemy, they certainly would have strong reasons to.
This is unlikely, because "The lord of the rings" was translated much later. "The Hobbit" was first published in 1976, there was an announcement that there is more, but the first volume of "The Lord of the Rings" was published only in 1983.
I found an article in English [1] that mentions a newspaper article called "Tolkien's Cosmos" that does indeed find political meaning in "The Lord". But that article was written much later, in 1997. I cannot find the article itself, but judging from the time and the newspaper ("The Independent Newspaper", proudly liberal) I would guess the author was not a proponent of communism, but the opposite: he equated Mordor or Saruman with Soviet Union because he considered himself to be the part of the winning forces of the West.
I've been involved in a couple of those cases, where a large company ran into an issue, and chose to solve it by migrating to something else. And while the issues certainly exist (and are being addressed), the technical reasons often turned out to be a rather tiny part of the story. And in the end it was really about internal politics and incentives.
In several such cases, the company was repeatedly warned about how they implemented some functionalities, and that it will cause severe issues with bloat/vacuuming, etc. Along with suggestions how to modify the application to not hit those issues. Their 10x engineers chose to completely ignore that advice, because in their minds they constructed an "ideal database" and concluded that anything that behaves differently is "wrong" and it's not their application that should change. Add a dose of politics where a new CTO wants to rebuild everything from scratch, engineers with NIH syndrome, etc. It's about incentives - if you migrate to a new system, you can write flashy blog posts how the new system is great and saved everything.
You can always argue the original system would be worse, because everyone saw it had issues - you just leave out the details about choosing not to address the issues. The engineering team is unlikely to argue against that, because that'd be against their interests too.
I'm absolutely not claiming the problems do not exist. They certainly do. Nor am I claiming Postgres is the ideal database for every possible workload. It certainly is not. But the worst examples that I've seen were due to deliberate choices, driven by politics. But that's not described anywhere. In public everyone pretends it's just about the tech.
Politics is an unavoidable aspect of larger groups, but it gets a lot worse when coupled by wrong incentives that reward heroic disaster mitigation over active disaster avoidance.
When you design a system around a database, it pays off to design your ways of mitigating performance issues you might face in the future. Often, a simple document explaining directions to evolve the system into based on the perceived cause. You might want to add extra read replicas, introduce degraded modes for when writes aren't available, moving some functions to their own databases, sharing big tables, and so on. With a somewhat clear roadmap, your successors don't need to panic when the next crisis appears.
For extra points, leave recordings dressed as Hari Seldon.