Tell HN: SQLite3 does have something like stored procedures

zzo38computer · on June 29, 2022

I know that and I have done that many times in the past (it probably is really obvious, like you say). However, triggers cannot return result rows, and a SELECT query that a view is defined as cannot take parameters (although there is a work-around by using a virtual table which contains a single row whose value is whichever value it is constrained to be). There is also WITH RECURSIVE, which also has many uses.

bbkane · on June 29, 2022

Hi! I'm from the other comment chain you originally posted this on. One question - how should I keep track of generated primary key IDs for future inserts? The best I could come up with is:

  CREATE TRIGGER role_user_view_trigger
  INSTEAD OF INSERT ON role_user_view
  FOR EACH ROW
  BEGIN
  INSERT INTO role(name) VALUES(new.role_name);
  INSERT INTO user(name) VALUES(new.user_name);
  INSERT INTO role_user(role_id, user_id, created_at) VALUES(
    (SELECT id FROM role WHERE name = new.role_name),
    (SELECT id FROM user WHERE name = new.user_name),
    "now"
  );
  END;

Full code at https://gist.github.com/bbkane/ed385787f06efe4473f3cb7a3b32c...

Do you know of a better way than those select statements (which I presume are slow) to get the IDs?

cryptonector · on June 29, 2022

First, you should know that you should only use single quotes for quoting strings, even in SQLite3 which allows double quotes.

Second, yeah, `last_insert_rowid()` doesn't work in this case, but what you might do is a) have a timestamp column and/or maybe a transaction ID column as well that you set in the INSERTs in the trigger body, b) maybe have audit/log tables that you insert into in those same trigger bodies, where you might use `last_insert_rowid()`.

Third, I urge you to think about why you care about the inserted row IDs. If you're doing INSERTs w/o `OR IGNORE`, then if your INSERT into the view succeeds then you know you can SELECT to see what IDs you got, then COMMIT. If you're doing `OR IGNORE`, then maybe you don't care whether you got IDs assigned just now or in the past, but you still want to know what IDs those rows have, so you select for them.

Lastly, I'd think about whether you want row IDs at all. I mean, they can be useful. And when it comes to things like "users", you generally want some sort of internal identifier other than the name. Like, if you rename users or roles often (but, you really should not allow that!) then using row IDs in the foreign keys means you don't have to worry about cascading renames. But you might be able to get away w/o role IDs. Also, you might find it better to have a single table for all "numeric ID" assignments, and then you could make your users and roles tables not have an INTEGER PRIMARY KEY, just the name as the PRIMARY KEY and a numeric ID column as a foreign key referencing that one single table.

E.g.,

  CREATE TABLE IF NOT EXISTS IDs
    (id INTEGER PRIMARY KEY,
     kind TEXT NOT NULL,
     name TEXT NOT NULL,
     UNIQUE (kind, name));
  
  CREATE TABLE IF NOT EXISTS user
    (name TEXT PRIMARY KEY,
     id INTEGER FOREIGN KEY REFERENCES IDs (id))
  WITHOUT ROWID;
  
  CREATE TABLE IF NOT EXISTS role
    (name TEXT PRIMARY KEY,
     id INTEGER FOREIGN KEY REFERENCES IDs (id))
  WITHOUT ROWID;

then in your INSTEAD OF INSERT trigger you'd

  - insert into IDs twice to get IDs for the user and role
  - insert into user
  - insert into role

like this:

  CREATE TABLE role_user (
    role_kind TEXT NOT NULL DEFAULT('role')
                            CHECK(role_kind = 'role'),
    user_kind TEXT NOT NULL DEFAULT('user')
                            CHECK(user_kind = 'user'),
    role_name TEXT, -- should be NOT NULL bc SQLite3
                    -- does not force that in PKs
    user_name TEXT, -- ditto
    PRIMARY KEY (role_name, user_name),
    FOREIGN KEY (role_kind, role_name)
        REFERENCES role(kind, name)
        ON DELETE CASCADE
        ON UPDATE CASCADE,
    FOREIGN KEY (user_kind, user_name)
        REFERENCES user(kind, name)
        ON DELETE CASCADE
        ON UPDATE CASCADE
  ) STRICT, WITHOUT ROWID;

  CREATE TRIGGER role_user_view_trigger
  INSTEAD OF INSERT ON role_user_view
  FOR EACH ROW
  BEGIN
  INSERT OR IGNORE INTO IDs(kind, name)
    VALUES('role', NEW.role_name),
    VALUES('user', NEW.user_name);
  INSERT OR IGNORE INTO role(name, id)
    SELECT NEW.role_name, id
    FROM IDs
    WHERE kind = 'role' AND name = NEW.role_name;
  INSERT OR IGNORE INTO user(name, id)
    SELECT NEW.user_name, id
    FROM IDs
    WHERE kind = 'user' AND name = NEW.user_name;
  INSERT OR IGNORE INTO role_user(role_name, user_name)
    VALUES(NEW.role_name, NEW.user_name);
  END;

I think you really want `OR IGNORE` in all cases, as it makes this trigger idempotent.

Want to know the numeric IDs assigned to the users and roles possibly created this way? Just SELECT for them!

bbkane · on June 29, 2022

I hadn't thought of this approach - It looks a little weird to me but I might be able to get used to it. I don't think it eliminates scanning the table - it even puts all IDs in a table that will grow large on its own.

I see what you mean about `OR IGNORE`. I do worry that this means I'm failing other checks like NOT NULL, but I could handle that in application code.

I really appreciate you taking the time to write all this out for me. While I enjoy it, I don't get to talk to many people about schema design, and I don't have a lot of experience with it. Could you recommend any reading I should do?

cryptonector · on June 29, 2022

In SQLite3 PRIMARY KEY and UNIQUE constraints beget an (hidden) index, so there should be no table scans.

Also, PRIMARY KEY implies NOT NULL for the columns that make it up, except that SQLite3 has a bug in it that causes it to allow NULLs in PK columns (sadpanda.png).

> Could you recommend any reading I should do?

First of all, I would start with the O'Reilly pocket book on SQL in general.

Second, I recommend you spend some time reading through SQLite3's docs and wikis. There's a trove of generally-useful stuff there. Pay special attention to everything to do with the WITHOUT ROWID feature. Row IDs can be a crutch that slows things down. That said, sometimes you really want an INTEGER PRIMARY KEY, and in SQLite3 that functions like a row ID.

As for schema design, you should know about normalization. The general advice is to normalize to the max then denormalize till your queries are performant. Much schema design wisdom derives naturally from that. For example, I showed you how to avoid row IDs because they seem unnecessary and they lead to extra duplication. For example, your role_user, if it use row IDs for its references, then queries on it will have to join to the user and role tables to get the names, or, if you end up having both, row IDs and names in it to avoid that, then now you have extra duplication and you'll even need a second set of FOREIGN KEY references in your role_user table to keep those names up to date.

I don't really have a good resource for this kind of knowledge. I've absorbed a lot of it by osmosis over many years, and I've developed some of it on my own.

You could use reading about EAV (entity-attribute-value) schema designs, too. Many love to hate them. I don't love them myself. But EAV schemas trivially lend themselves to writing graph queries. For example, one system I work with has a notion of relationships and, interestingly, of relationship groups that are interesting for graph queries, and it represents relationships as an EAV table, so one can easily write a recursive SQL query that traverses all paths from one entity on one side of the graph to the another (or all reachable ones) on the other side without having to make reference to the names of the relationship types.

I am of the opinion that an RDBMS could provide both, a relational table metaschema, and an EAV equivalent, so that one could write very clear queries against the former, but also very simple and powerful graph queries against the latter. However, I've not seen that anywhere. I want this.

BTW, again, when it comes to security-relevant entities, I highly recommend that you a) not allow renames, b) don't allow deletions -- users can be tombstoned deleted/terminated/separated, but do not allow their names to be reused. The reason is that others will tend to write down usernames, group names, etc., and use those for authorization, and then name reuse can cause you grief. You might allow aliases so that users can get different canonical names if they insist on it, but don't allow their earlier names to be reused.