Avatar

SQL patterns, tips and tricks

← Back to list
Posted on 07.08.2023
Image by derekb on Flickr
Refill!

I don't face SQL challenges every day. Sometimes I find a good solution for a case I had to crack, but then as time passes by, I tend to forget such a valuable solution I've found earlier. So I either have to google again, or re-discover the approach.

Well, not any longer. Here I will post a list of cases I sometimes work on, along with appropriate solutions.

Joining data that was prepared beforehand

The task is: we have a table of the main entity, and we also have another table that holds some auxiliary entity that references the first one. We need to pre-calculate some info and join it with the main entity. A good example would be a product and users saving that product to favourites.

So the following query gives us the information we need:

SELECT
product_id,
bool_or(is_favourite) AS is_favourite,
bool_or(is_popular) AS is_popular
FROM customer_preferences
GROUP BY product_id;
The code is licensed under the MIT license

And later on we can make use of JOIN (SELECT ...) syntax, but that would mean the sub-query forcing Postgres to create a temporary table for immediate join.

To prevent that, a materialised view can be used. A materialised view is the way to store the data selection beforehand and avoid unnecessarily repetitive data selection.

😭😆 From experience: materialized views must not be used without having a plan B of what to show to a user if the MV is not up-to-date. MV are better be used on rather static, rarely changed data.

CREATE MATERIALIZED VIEW products_user_info IF NOT EXISTS AS
SELECT
product_id,
bool_or(is_favourite) AS is_favourite,
bool_or(is_popular) AS is_popular
FROM customer_preferences
GROUP BY product_id
WITH DATA;
The code is licensed under the MIT license

Then the view can be joined as a regular table:

SELECT * FROM products p
INNER JOIN products_user_info pui ON pui.product_id = p.id;
The code is licensed under the MIT license

Note two things:

  1. A materialised view consumes space on a hard drive just like a regular table does.
  2. A materialised view does not automatically refresh itself on every change of the related data, it needs to be re-populated manually or on a trigger, using the following SQL command:
    REFRESH MATERIALIZED VIEW products_user_info;
    The code is licensed under the MIT license
  3. To maintain accessibility, a materialized view should be better refreshed concurrently. In this case you'll need a covering index on such MV.
    REFRESH MATERIALIZED VIEW CONCURRENTLY products_user_info;
    The code is licensed under the MIT license

Making use of triggers

So, the materialized view needs to be updated. One can do it on the application level, but sometimes when there is no bulk imports/updates expected as a business need, it makes sense to run the refresher after every SQL statement execution that modifies the source data.

😭😆 From experience: before creating a trigger, please make sure the table is not heavily written! If that is the case, the CPU will spike to the clouds.

CREATE OR REPLACE FUNCTION refresh_products_user_info()
RETURNS TRIGGER LANGUAGE plpgsql
AS $$
BEGIN
REFRESH MATERIALIZED VIEW products_user_info;
RAISE LOG 'refresh_products_user_info() was executed';
RETURN NULL;
END $$;
CREATE TRIGGER refresh_products_user_info
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON products
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_products_user_info();
The code is licensed under the MIT license

Note the logging statement, it can be used to check if the trigger was actually executed on prod later on.

How to disable a trigger and make sure it is disabled:

ALTER TABLE products DISABLE TRIGGER refresh_products_user_info;
SELECT tgname, (case when tgenabled = 'D' then 'N' else 'Y' end) as enabled FROM pg_trigger WHERE tgrelid = 'public.products'::regclass;
The code is licensed under the MIT license

Implementing counters

The task: I have an entity that can have different states. I need to query total count of that entity in different states. A good example is a task that can be "open", "in progress", "done", etc. I need to know how many tasks I have in each of the states.

So there are three approaches:

  1. Calculate the counts on every query. That solution is resource and time ineffective.
  2. Store the counters in a separate table and update the counters on every task insertion, update or deletion. This approach is read-optimal, however it is not concurrent-safe.
  3. Use the audit table pattern I have discussed before in details. The approach is concurrent-safe, however it may be not that read-performant and it has other limitations as well.

There is no right or wrong approach, choose one that suits your case better.

Storing nested data structures, such as trees

There are again three ways to do implement trees

  1. A straight-forward, naive approach would be to just keep a reference to the left and right node by storing primary keys of the corresponding records. This way is far from being read-optimal, as it requires a recursive algorithm to be in place in order to get a sub-set of items, and moreover the approach can only allow storing binary trees.
  2. The other solution is the Nested Sets pattern. This method is suitable for keeping tree-only structures and optimised for reading tree sub-branches.
  3. The close competitor of the previous one is the Adjacency List pattern. The adjacency list pattern can work with graphs of a general topology, but more space-consuming.

Note that there are more suitable databases on the market to work with such data structures, called Graph Databases.

Links

Here is a list of amazing articles I've found on the subject.

Well, this article is a work in progress, just like the life of an engineer itself :) I will add more things as I keep discovering.


Avatar

Sergei Gannochenko

Business-oriented fullstack engineer, in ❤️ with Tech.
React, Node, Go, Docker, AWS, Jamstack.
15+ years in dev.