Insights

Relational Singularity

Let's poke at this old dispute – or a source of confusion, depending on how you look at it.


-- Plural naming convention:
select * from users
-- vs. singular naming one:
select * from user

Traditional textbooks would often suggest using singular, but in practice, we see a lot of plural naming convention out there – although it is not universal (Django famously defaults to singular.)

So, which one should it be?

Many of us see this is a stylistic choice. It could be one – but is it really? I think the core confusion comes from the “programmer's mindset” of treating tables as a collection of objects. After all, the visual model of a table does suggest such a notion:


users = [User{id:1, name:"Alice"}, User{id:2, name:"Bob"}]

This is actually part of the reason why document databases became popular: they map perfectly into the programmer's mental model of collections containing objects that can encapsulate other objects.

And many argue that it sounds far more natural in English to say "select everything from the collection of users" than "select everything from the collection of user".

But the database is actually a set of logical propositions describing the world, and a relation is effectively a type (or a shape) of a proposition.


user(1, "Alice")  ← This proposition is true
user(2, "Bob")    ← This proposition is true

With the knowledge of this, it's actually becoming clearer than we can explain our query in English as "select everything in every instance of a user".

And this "type" angle? It is not theoretical – for example, Postgres defines a type for each relation. So if you create a relation called users, you will have a type called users, too. And that starts sounding odd, doesn't it? And even if we assume this to be a quirk of Postgres, the real question remains:

Does this connection to theory help us – or is it theory for the sake of theory, with little practical benefit?

Let's think about our wide denormalized relations for a little bit. If every relation is a logical proposition, then we may end up with statements like: "A user identified as 1 with a name of John Doe and a birthdate of Jan 1st, 2001, email of [email protected], password hash $2y$10$71.rcRm4j3BaeQd12kXXI.D.PlgXgH4WAfFgw/TenOXqKkIriWQWe, male, title of Mr, phone number 555-555-555, signed up Aug 1, 2025 and last updated Aug 5, 2025.".

It's a mouthful – and it is still a proposition alright, but you can't quite say it in one breath. Why does this matter? Well, it's just a really long statement. It asserts a lot. But it absolutely works – most of the world's databases are like that. But here's an interesting thing: the most constant thing in the world is change. And, to add to it, woes with schema migrations. You may have seen it all. But why do we change definitions of relations? Well, because our understanding of the world changes. So we change the definitions of our statements to reflect that, even though the previous definition didn't become invalid on its own.

Just because the original definition gets superseded with a new one, it doesn't stop being true:


user{id,name,dob,email,password_hash,
     gender,title,phone_number,signed_up,updated_at}
 → 
user{id,name,dob,email,password_hash,
     gender,title,phone_number,tax_id,signed_up,updated_at}

But now the database is tasked with augmenting the storage, figuring out the situation with non-nullable values and so on. If we, however, had a very lean user relation, we could have just added a user_tax_id relation and call it a day.

And even when we simply query the relation, do we really need all the data at all times? For comprehensive summary data exports, sure. But in real practical scenarios, we typically want to find users that match one or two criteria, and we want to know one or two things about them: we want to check if the password matches the hash, or find all users who signed up after a certain date, and so on.


user{id}
user_name{user_id,name}
user_dob{user_id,dob}
...
user_tax_id{user_id,tax_id}

Interestingly enough, if you squint your eyes, it feels like you don't even need relation names at this point, as they are largely duplicating the meaning. The relation's content defines what the relation is. This illustrates the simplicity of such assertions:


user{id}
{user_id,name}
{user_id,dob}
...
{user_id,tax_id}

Of course, this elegant theoretical approach runs headfirst into the classic performance reality. The absolute bomb argument for denormalization has most often been performance. And there's no way to tiptoe around it: yes, multiple joins performance leaves a lot to be desired for. This is why many architects choose to forego normalization. SQL syntax issues (joins just make queries look big) are, on the surface, trivially solvable if there is willingness. We are more interested in the fundamental problem: can joins be made fast?

We're still prototyping, but there might be an opportunity for Postgres (and other databases) to revisit this problem and decades-long assumptions. What if indices were not hard-bound to a singular relation? What if we used our knowledge of relational cardinality estimates to advise the database? What if joins were truly parallel, down to the SSD level?

But until we fix the fundamentals – what if you want to have a performant joined view and normalization, what do you do? Well, at this point, materialized views may fit the bill – especially incremental ones. In practice, this often turns out to be a very reasonable solution. You get to maintain logical clarity, and you do get performance for those cases where you need it, without sacrificing yourself to the altar of schema migrations.