This works until you have an A->B->C hierarchy of entities and want to monitor t...

skyde · on April 30, 2021

if C is a "child entity" of C then it's primary key should have the primary key of B as a prefix.

This is how I design all my table schema. and it make database partitioning easier too.

derefr · on April 30, 2021

I wish I could do this, but I already have more than four billion As, so my A IDs need 64 bits. I use composite keys for my Bs (A ID + sequence number per A), which allows me to search on the pkey index using just the A column; but by the time I get to C, that gets unwieldy. So my Cs have UUIDs, my Bs also have UUIDs (secondary unique ID, not the pkey) and my Cs have a foreign key pointing to their B. While also having an indexed A ID field.

I'd love to just just have "A-B-C" as my Cs' IDs... but it'd only work for my use-case (i.e. be performant) if it was running on a computer with 256-bit registers.

skyde · on April 30, 2021

Curious why your primary key are GUID? Also if the primary key of table C is made of 3 column (A GUID, B GUID, C GUID) the index in PostgreSQL will not be that big with compression enabled because all the common prefix will not be stored redundantly.

Also having a single compound index on table C covering column (A ForeingKey, B ForeingKey, C GUID) is much better than having multiple index on table C.

tehlike · on May 1, 2021

Having the primary key generatable at application level has performance benefits especially on inserts. You can write all related entities in one go vs waiting for db to generate the the pk, and passing that to dependent entities.

skyde · on May 6, 2021

Also the problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.

skyde · on May 6, 2021

you can have "id" Generatable at application level without using GUID. This is what twitter is doing "snowflake".

It generate tens of thousands of ids per second. Those id fit in 64 bits. And those id are sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another.

See: https://blog.twitter.com/engineering/en_us/a/2010/announcing...

Ombudsman · on April 30, 2021

GUIDs are extremely useful if you want to move and merge data around without having to worry about primary key collisions.

saurik · on April 30, 2021

I would consider this model "usually incorrect" in that it is quite likely you will eventually want to be able to move an entity to a different parent. I would thereby always give people the advice that one "should" have primary keys for every entity type that are at least global for that local entity type (if not truly global for all entity types, as there is a lot of power doing such) and then implement this child relationship as a unique foreign key constraint, unless you have some very specific functionality or performance requirement that would force primary key conflation (and FWIW, "I want to use this WAL watcher" might count, but it seems like a steep price to pay being unable to do sane reparenting).

skyde · on May 6, 2021

I agree the children entity ID must be already unique without having to use the parent entity id.

It's just that in my experience have the children table primary b-tree sorted on ParentID then childrenID make join much more efficient unless you can use table interleaving like in Google SpannerDB https://cloud.google.com/spanner/docs/schema-and-data-model#...

JakaJancar · on April 30, 2021

“Should” is pretty strong. The choice of PK has its own tradeoffs. But you are correct, if you go the prefix route, it makes the WAL approach trivial, at least while everything really is a tree and there are no many-to-many relationships.