Today I Learned

4 posts about #sql

Bulk Upsert through PostgreSQL 9.5+

Warning: This only works for PostgreSQL 9.5+


Given the following:

  1. a users table:

    • id
    • name
    • created_at
    • updated_at
  2. users table only has 1 user with values:

    • id: 1
    • name: Alejandro
    • created_at: '2010-10-10 10:00:00.000000'
    • updated_at: '2010-10-10 10:00:00.000000'

You can upsert using the following SQL query:

INSERT INTO
  users(id, name, created_at, updated_at)
VALUES
  (1, 'Alexander', NOW(), NOW()),
  (2, 'Belle', NOW(), NOW())
ON CONFLICT (id)
  DO UPDATE SET
    name=EXCLUDED.name,
    updated_at=NOW()
;

Results:

  1. User(id=1) will be renamed from Alejandro to Alexander. The updated_at value will be set to current time.
  2. User(id=2) will be inserted to users table with name = Belle, and both created_at and updated_at will be set to current time.

Using Trigram Indexes to Speed Up LIKE Queries

Queries containing ILIKE '%anything%' result in table scans since they have a leading wildcard. Typical btree indexes can not help improve performance in this case.

Trigrams are 3-character slices of words:

SELECT show_trgm('Ruby');
-- {"  r"," ru","by ",rub,uby}

They are useful as indexes since they can speed up LIKE, ILIKE, ~, and ~* clauses.

Let's query a table of 14K email addresses:

CREATE TABLE emails (email TEXT);

\copy emails FROM 'emails.csv' DELIMITER ',' CSV;

EXPLAIN ANALYZE SELECT * FROM emails WHERE email ILIKE '%ion%';

The query plan shows a table scan:

Seq Scan on emails
  Filter: (email ~~* '%ion%'::text)
  Rows Removed by Filter: 14040

with an average execution time of 15ms. Let's create a trigram index and try again:

-- postgres 9.2+
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX emails_search_email_idx ON emails USING GIN (email gin_trgm_ops);

EXPLAIN ANALYZE SELECT * FROM emails WHERE email ILIKE '%ion%';

The improved query plan is:

Bitmap Heap Scan on emails
  Recheck Cond: (email ~~* '%ion%'::text)
  Heap Blocks: exact=98
    -> Bitmap Index Scan on emails_search_email_idx
       Index Cond: (email ~~* '%ion%'::text)

and averages 0.8ms. A 19X improvement without updating any SQL queries! YMMV

Use PostgreSQL table as queue skipping locked rows

This command only works in PG 9.5

First, are you sure you shouldn't be using some in-memory queue? Or some message broker? Or Redis?

If you are really convinced that you want this sort of behaviour on top of the goo'old PostgreSQL you can achieve a queue-like access pattern by locking and skipping locked rows.

One way to ensure that only one database client can modify a set of rows is to select the rows for update as in

BEGIN;
SELECT * FROM queue ORDER BY priority DESC LIMIT 1 FOR UPDATE;
...

This will block any other UPDATE or SELECT .. FOR UPDATE until the above transaction is finished. Now if you want concurrent database clients to fetch the next row available in the priority list just use:

BEGIN;
SELECT * FROM queue ORDER BY priority DESC LIMIT 1 FOR UPDATE SKIP LOCKED;
...

The above command could be interpreted as select the next row from queue that nobody has yet locked.

SQL's WITH RECURSIVE Query

While optimizing calls to a recursive table, we found a neat SQL solution. It uses a common table expression as a working table to query against iteratively.

Here's an example of using WITH RECURSIVE with a modified nested set example of clothing categories that find all paths through the categories:

CREATE TEMPORARY TABLE categories (id INT, name text, parent_category_id INT);

INSERT INTO categories VALUES
  (1, 'Clothing', null),
  (2, 'Mens''s', 1),
  (3, 'Women''s', 1),
  (4, 'Suits', 2),
  (5, 'Dresses', 3),
  (6, 'Skirts', 3),
  (7, 'Jackets', 4),
  (8, 'Evening Gowns', 5);

WITH RECURSIVE category_hierarchies AS
(SELECT id, parent_category_id, name AS full_path
 FROM categories
 WHERE parent_category_id is NULL

 UNION ALL

 SELECT child_categories.id,
        child_categories.parent_category_id,
        parent_categories.full_path || ' -> ' || child_categories.name as full_path
 FROM categories AS child_categories
 INNER JOIN category_hierarchies AS parent_categories
   ON child_categories.parent_category_id = parent_categories.id
)
SELECT full_path FROM category_hierarchies ORDER BY full_path;

Produces paths through all categories:

  • Clothing
  • Clothing -> Mens's
  • Clothing -> Mens's -> Suits
  • Clothing -> Mens's -> Suits -> Jackets
  • Clothing -> Women's
  • Clothing -> Women's -> Dresses
  • Clothing -> Women's -> Dresses -> Evening Gowns
  • Clothing -> Women's -> Skirts

Read more about WITH RECURSIVE queries