I have been running the exact same pair programming interview for ~100 candidates so far. Make a little exercise that is representative of your work, run everyone through it. Whatever you test for, that's what you should expect to get.
He was down-leveled to a first level manager at the company you are at? He accepted this? Why? Do you think he / the new company chose wisely? What ended up happening?
I’m not sure why he accepted it, I never pried too much. It was his first big tech job. It’s very possible he still made more money as a first-level manager, so it might’ve still been a net win for him.
He was a great manager, he’s since moved up the ranks but he’s still at the same big tech co. So from both the company’s and his perspective, I suppose everyone’s happy.
Wouldn't be surprised if it was money. My family member runs a software company, salaries came up recently and found out I make as much as their director.
This lasts right up until an important customer can't access your services. Executives don't care about downtime until they have it, then they suddenly care a lot.
You can often have services available for VIPs, and be down for the public.
Unless there's a misconfiguration, usually apps are always visible internally to staff, so there's an existing methodology to follow to make them visible to VIPs.
But sometimes none of that is necessary. I've seen at a 1B market cap company, a failure case where the solution was manual execution by customer success reps while the computers were down. It was slower, but not many people complained that their reports took 10 minutes to arrive after being parsed by Eye Ball Mk 1s, instead of the 1 minute of wait time they were used to.
"Pick a random number between 1 and 10" is also a clear and consistent method, and also not particularly meaningful.
The point I took from the article is that we should stop paying attention to this meaningless metric. I didn't read it as a request to replace it with another metric.
It's not leaking that's the concern. It's that not having the names of objects be easily enumerable is a strongly security-enhancing feature of a system.
Yes of course everyone should check and unit test that every object is owned by the user or account loading it, but demanding more sophistication from an attacker than taking "/my_things/23" and loading "/my_things/24" is a big win.
With a single sequence and a busy system, the ids for most high-level tables/collection are extremely sparse. This doesn't mean they can't be enumerated, but you will probably notice if you suddenly start getting hammered with 404s or 410s or whatever your system generates on "not found".
Also, if most of your endpoints require auth, this is not typically a problem.
It really depends on your application. But yes, that's something to be aware of. If you need some ids to be unguessable, make sure they are not predictable :-)
If you have a busy system, a single sequence is going to be a pretty big performance bottleneck, since every resource creation will need to acquire a lock on that sequence.
> Also, if most of your endpoints require auth, this is not typically a problem.
Many systems are not sparse, and separately, that's simply wrong. Unguessable names is not a primary security measure, but a passive remediation for bugs or bad code. Broken access control remains an owasp top 10, and idor is a piece of that. Companies still get popped for this.
Well, the article is literally about what happens when you're a leader of such movement, not if you're a random person on the street talking with other random people.
People are generally not super closed nor open about it, although some individuals were more closed about it. Most seemed honest when asked about it, but again, YMMV.
This is incredibly database-specific. In Postgres random PKs are bad. But in distributed databases like Cockroach, Google Cloud Datastore, and Spanner it is the opposite - monotonic PKs are bad. You want to distribute load across the keyspace so you avoid hot shards.
In Google Cloud Bigtable we had the issue that our domain's primary key was a sequential integer autogenerated by another app. So we just reversed it, and it distributed automatically quite nicely.
It is, although you can have sharded PostgreSQL, in which case I agree with your assessment that you want random PKs to distribute them.
It's workload-specific, too. If you want to list ranges of them by PK, then of course random isn't going to work. But then you've got competing tensions: listing a range wants the things you list to be on the same shard, but focusing a workload on one shard undermines horizontal scale. So you've got to decide what you care about (or do something more elaborate).
It's also application specific. If you have workload that's write heavy, has temporal skew and is highly concurrent, but rarely creates new records, you're probably better off with a random PK, even in PG.
Even in a distributed database you want increasing (even if not monotonic) keys since the underlying b-tree or whatever will very likely behave badly for entirely random data.
UUIDv7 is very useful for these scenarios since
A: A hash or modulus of the key will be practically random due to the lower bits being random or pseudo-random (ie distributes well between nodes)
B: the first bits are sortable.. thus the underlying storage on each node won't go bananas.
I wouldn't say it is incredibly database specific, it is more database type specific. For most general, non-sharded, databases, random key values can be a problem as they lead to excess fragmentation in b-trees and similar structures.
As long as the key has sufficient entropy (i.e. not monotonic sequential ints), that ensures the keyspace is evenly distributed, correct? So UUID>=v4, ULID, KSUID, possibly snowflake, should be fine for the sake of even distribution of the hashes.
I think they address this in the article when they say that this advice is specific to monolithic applications, but I may be misremembering (I skimmed).
I'm not making any claims at all, I was just adding context from my recollection of the article that appeared to be missing from the conversation.
Edit: What the article said:
> The kinds of web applications I’m thinking of with this post are monolithic web apps, with Postgres as their primary OLTP database.
So you are correct that this does not disqualify distributed databases.
100%. You can use rendezvous hashing to determine the shard(s). The hash of a sequence should be randomly distributed as changing the LSB should propagate to 50% change in the output bits.
reply