Variant Types in PostgreSQL

Typed variants, case classes, tagged unions, algebraic data types or just enums: variant types are a feature common to many programming languages but are an awkward fit for SQL. Most languages have something like tuple types and these map cleanly to databases; but polymorphism — in the form of unions, inheritance or traits — has no parallel in the language of data.

Tagged unions provide a means to model several kinds of polymorphism in storage, because all storage must ultimately be of concrete data. In this essay, we model traits with database-level tagged unions and we develop a SQL macro that provides for tagged unions in Postgres.

Typed Alternatives

Polymorphic associations cover a broad family of designs where we model what is essentially a struct with a field that could have one of a few representations.

Sometimes, using semi-structured data to model alternatives is a viable option; but not always:

  • It’s hard to write good constraints for semi-structured data.
  • It’s difficult to reverse the schema to understand its intent, as must often be done by data scientists.
  • Polymorphic fields can result from structures that store values with a certain “trait” or “protocol” or “interface”. This ad-hoc polymorphism can bring two types together at one time and not at another, which can make it very hard to say what table to put them in. (We’ll discuss this point more in an example, below.)

Let’s consider the case of ticket admissions for an events processor. Sometimes people are admitted with tickets, sometimes with season passes, and sometimes as guests — but all admissions must be recorded. We might start out with an admission table and a text column saying “ticket”, “pass” or “guest” but pretty soon that gets unwieldy. We want to know: which ticket? The guest of whom?

struct Admission {
let asOf: NSDateTime
let pass: Pass
let person: NSPerson
}
enum Pass {
case SeasonPass(code: String)
case Guest(person: NSPerson)
case SingleTicket(id: NSUUID)
case Misc(note: String) // The Queen, &c
}

When we model this in SQL, we can use a JSON column to store the details of a ticket:

CREATE TABLE admission (
id uuid PRIMARY KEY,
as_of timestamptz NOT NULL DEFAULT now(),
pass uuid NOT NULL REFERENCES pass,
person uuid NOT NULL REFERENCES person
);
CREATE TABLE pass (
id uuid PRIMARY KEY,
issued timestamptz NOT NULL DEFAULT now(),
type text NOT NULL,
data json NOT NULL
);

Now this all very nice until…

Ad-hoc Polymorphism

Every so often we have to figure out how much people paid for tickets. We have a bunch of different CSVs we use to correlate them. Customer service ends up browsing the database together with all these CSVs in Dropbox to address customer issues; but as the dataset grows, browsing gets slower and a privacy disaster looms. Time to pick out some of the metadata and store it in the database. In our application we end up with a struct like this:

struct Payment {
let pass: Pass
let as_of: NSDateTime
let price: Double
let processor: String // PayPal, *Star, &c.
let taxes: Double
let taxJurisdiction: String // NYC, Canada, &c.
}

When it comes to the schema, though, we are struck by the oddity of payments being recordable for a guest pass or miscellaneous event.

CREATE TABLE payment (
id uuid PRIMARY KEY,
pass uuid NOT NULL REFERENCES pass,
as_of timestamptz NOT NULL DEFAULT now(),
price numeric(10,2) NOT NULL,
processor text NOT NULL,
taxes numeric(10,2) NOT NULL,
tax_realm text NOT NULL
);

This is true at the application layer, too; and it seems like asking for trouble. A guest pass with a payment applied is probably a mistake not of addition but of omission — there is probably some ticket out there without a payment attached. Let’s break up our passes into separate types and use protocols (Swift for traits) to mark them as being of the appropriate type:

protocol Pass {}
protocol FreePass: Pass {}
protocol PayPass: Pass {
func findPayment() -> Payment?
}
struct Guest: FreePass {
let person: NSPerson
}
struct Misc: FreePass {
let note: String
}
struct SeasonPass: PayPass {
let code: String
}
struct SingleTicket: PayPass {
let id: NSUUID
}
struct Admission { /* Same as before. */ }
struct Payment {
let pass: PayPass
/* Remainder unchanged. */
}

If we model tickets as before, with a single table, then we will be unable to encode the invariant for payments: that they reference only tickets that cost money. But if we use separate tables, then it is not possible for an admission to have a foreign key declared that points to any kind of pass, because foreign keys can reference only one table.

It is this modeling problem – and a very similar one presented by inheritance – that suggests the use of tagged unions at the database level.

Unions in SQL

When we try to write a schema for the application datatypes mentioned above, can model nearly everything with precision — everything but the restriction on the difference between the kinds of passes that an Admission and a Payment can refer to:

CREATE TABLE admission (
id uuid PRIMARY KEY,
as_of timestamptz NOT NULL DEFAULT now(),
pass uuid NOT NULL REFERENCES ???,
person uuid NOT NULL REFERENCES person
);
CREATE TABLE payment (
id uuid PRIMARY KEY,
pass uuid NOT NULL REFERENCES ???,
as_of timestamptz NOT NULL DEFAULT now(),
price numeric(10,2) NOT NULL,
processor text NOT NULL,
taxes numeric(10,2) NOT NULL,
tax_realm text NOT NULL
);
CREATE TABLE guest_pass (
id uuid PRIMARY KEY,
issued timestamptz NOT NULL DEFAULT now(),
person uuid NOT NULL REFERENCES person
);
CREATE TABLE misc_pass (
id uuid PRIMARY KEY,
issued timestamptz NOT NULL DEFAULT now(),
note text NOT NULL
);
CREATE TABLE season_pass (
id uuid PRIMARY KEY,
issued timestamptz NOT NULL DEFAULT now(),
code text NOT NULL
);
CREATE TABLE single_pass (
id uuid PRIMARY KEY,
issued timestamptz NOT NULL DEFAULT now()
);

If we could create a union of two types — really two tables — then we’d be able to able to fill in the REFERENCES constraints above:

CREATE UNION pay_pass (id uuid) OF (single_pass, season_pass);
CREATE UNION free_pass (id uuid) OF (guest_pass, misc_pass);
CREATE UNION pass (id uuid) OF (pay_pass, free_pass);
CREATE TABLE admission (
id uuid PRIMARY KEY,
as_of timestamptz NOT NULL DEFAULT now(),
pass uuid NOT NULL REFERENCES pass,
person uuid NOT NULL REFERENCES person
);
CREATE TABLE payment (
id uuid PRIMARY KEY,
pass uuid NOT NULL REFERENCES pay_pass,
as_of timestamptz NOT NULL DEFAULT now(),
price numeric(10,2) NOT NULL,
processor text NOT NULL,
taxes numeric(10,2) NOT NULL,
tax_realm text NOT NULL
);

Although we didn’t model the types in our application with a tagged union, we still adopted a tagged union approach at the database level. If we were to switch the application layer declarations to make PayPass and FreePass enums with two constructors each, it wouldn’t change our database-level representation. The distinction between certain kinds of polymorphism — traits, inheritance or tagged unions — is flattened away by storage: we can only write and read concrete values. Information about polymorphism has to find a concrete representation in storage, which we preserve here by storing the IDs of concrete values redundantly in “pseudo-tables”, the unions.

Implementing Disjoint Unions in With PL/pgSQL

Direct extension of the Postgres syntax and semantics with stored procedures is not possible with simple plugins; but we can get very close to our union declaration with a macro that generates some triggers, some foreign keys and a view.

CREATE TABLE pay_pass (id uuid PRIMARY KEY);
SELECT variants.variant('pay_pass', 'single_pass');
SELECT variants.variant('pay_pass', 'season_pass');
  • Foreign keys tie the subtables to the union. This ensures that deleting something from the union deletes it from the subtables.
  • Triggers on the subtables propagate inserts, updates and deletes to the union; and this has the effect of forcing changes to subtables to go through the primary key constraint on the union.
  • A view allows us to fetch each row in the union, embedding it as a field, with one column for each subtable. All of the subtable columns are null save one. (The view prevents us from having to store the data multiple times.)

The full code is available as a single SQL file, which adds a few functions and tables in the variants namespace. Our solution relies on the meta module from macaroon, a utility library for Postgres introspection, which in turn relies on the information schema or catalog, a little known feature of modern SQL databases: tables and views which describe all the tables, views, types, functions and constraints in the databases.

Unions, SQL & Portability

One of the strengths of SQL is its portability. SQL does not closely resemble the type system of any application language; but it offers a wide variety of types and validation techniques that facilitate storage from many languages. Many programming environments share in the benefit of work that has gone into optimizing and improving SQL databases.

A complete and comprehensive model of ADTs or inheritance would limit SQL’s usefulness, tying it to a particular kind of language or programming. Tagged unions as here presented allow one to model the essence of polymorphism in SQL.

Like what you read? Give Jason Dusek a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.