Anyone want to chime in with the largest app they've deployed where prod Postgre...

nucleardog · on Sept 21, 2022

Used to work developing standalone kiosks and things (think along the lines of stuff you'd find at a museum as the basic type of thing I'd work on), so absolutely not the use case you're thinking of.

In a number of cases, we were working with data that would really benefit from actual GIS tooling so PostGIS was kind of the natural thing to reach for. Many of these kiosks had slow, intermittent, or just straight up no internet connection available.

So we just deployed PostGIS directly on the kiosk hardware. Usually little embedded industrial machines with passive cooling working with something like a generation or two out-of-date Celeron, 8GB RAM, and a small SSD.

We'd load up shapefiles covering a bunch of features across over a quarter million square miles while simultaneously running Chromium in kiosk mode.

I know for a fact some of those had around 1k DAU. I mean, never more than one simultaneously but we did have them! I'm sure it would have handled more just fine if it weren't for the damn laws of physics limiting the number of people that can be physically accessing the hardware at the same time.

That said, we had the benefit of knowing that our user counts would never really increase and due to real-world limitations we'd never be working with a remote database because there was literally no internet available. In general I'd still say colocating your database is not an excuse to allow N+1 query patterns to slip in. They'll be fine for now and just come back to bite you in the ass when your app _does_ outgrow colocating the app/database.

jzelinskie · on Sept 21, 2022

I'm surprised no one is chiming in from the LAMP era. Maybe I was just in a bubble, but I feel like before modern cloud providers, folks deploying on VPSs tried to keep operational complexity lower by running their database on the same box until they couldn't anymore.

In my experience, I just cached the living hell out of the apps to avoid the I/O from the database. Providers in those days had huge issues with noisy neighbors, so I/O performance was quite poor.

[0]: https://en.wikipedia.org/wiki/LAMP_(software_bundle)

BeefWellington · on Sept 21, 2022

Back in my consulting days one of my clients had an application with nearly all of the business logic being done in-database. The app was essentially a frontend for stored procedures galore and it sat on an absolute monster of a system for its time (~256GB of RAM, 24 or 32 Cores I think). It handled something like 100k DAU with no issue. That was for just the transactional system. They replicated data to other (similar style) systems for less mission-critical things like reporting, billing, etc.

I want to be clear though, I would never recommend anyone do this. For its time it was impressive but I suspect by now it's been re-engineered away from that model, if for no other reason than its a lot of eggs in one basket.

shrubble · on Sept 21, 2022

It's a great way to deploy if you have the RAM and disk.

Bump up the shared buffers, have a multithreaded or pooled driver for your app server so connections are not being opened and closed all the time.

It really 'flies' under such a scenario.

Largest app was a big analytics and drug tracking application for a small pharmaceutical company. Not sure of the size but it was very reporting heavy.

pphysch · on Sept 21, 2022

I would wager that the median IT shop/dept is running off a single node RDBMS, the majority of which also have the spare resources to support running the primary application (which has 1-100 DAU).

No practical way to test this, of course.

metafex · on Sept 21, 2022

Had an instance with almost 2TB on disk in 2015, it's probably a lot more by now, if it's still running as-is. Was for analytics/data-warehousing. Though the big table was partitioned so queries were always nice and snappy. The machine had 128gb ram back then, but probably never needed that much. The applications working with the data ran on that same server, so processing latency was usually <10ms even for larger queries.

whalesalad · on Sept 21, 2022

is it cheating when pgbouncer is being accessed via socket? lol

tptacek · on Sept 21, 2022

Yes, that is cheating.

kjeetgill · on Sept 22, 2022

I don't have any context around this: why is that cheating?

tptacek · on Sept 22, 2022

Because the point of the Unix domain socket (in this argument) is eliminating network round trips.

kjeetgill · on Sept 22, 2022

Oh gotcha, I assumed they meant all 3 we're on one host. I'm not too familiar with the pg ecosystem.

fdr · on Sept 21, 2022

Not me personally, but my understanding some Cisco products have Postgres "embedded." If they're not using unix domain sockets, they're at least using loopback.

...come to think of it, they may not be the only network device maker that does this.

tshaddox · on Sept 21, 2022

Are there significant limits to the size of an app that could be deployed alongside Postgres versus the size of an app that could be deployed alongside SQLite?

zie · on Sept 21, 2022

No, the big difference is write speed. SQLite is limited in the amount of writes it can do at the same time(typically 1). This is generally called "concurrent writes". Last I checked SQLite can't really get past 1 write at the same time, but it can emulate concurrent writes in WAL mode such that it isn't normally a problem, for most applications. Postgres has no such limit(though there can be limits to updating a particular row/column of data concurrently, obviously).

Otherwise both are generally limited to the physical resources of the machine(memory, disk, etc). Generally speaking you can scale boxes much much farther than your data size for most applications.

nicoburns · on Sept 21, 2022

> it can emulate concurrent writes in WAL mode

Seems a bit unfair to call WAL mode emulation of "true" concurrent writes as I'm pretty sure a write-ahead-log (WAL) is exactly how other databases implement multiple concurrent writes. It's just always-on rather than being opt-in.

zie · on Sept 21, 2022

Well, but that's not really what it's doing. WAL1 mode in SQLite is different than in your typical RDBMS. WAL mode in SQLite is just getting the write path outside of the read path, but there is still only 1 write allowed at any given time. The "fake" concurrency is done by letting write transactions queue up and wait for the lock.

See [here](https://sqlite.org/wal.html) under "2.2. Concurrency" where it says:

> "However, since there is only one WAL file, there can only be one writer at a time."

SQLite is awesome, I'm a huge fan, but if you need to do lots and lots of writes, then SQLite is not your friend. Luckily that's a very rare application. There is a reason you never see SQLite being the end point for logs and other write-heavy applications.

lbhdc · on Sept 21, 2022

We use managed postgres on GCP, and it is served over unix socket.

cbsmith · on Sept 21, 2022

Kind of by definition, GCP's managed postgres is served up over the network. I'm guessing you mean "unix socket" in the sense of the socket API, not unix domain sockets.