Using Trigram Indexes to Speed Up LIKE Queries
Queries containing ILIKE '%anything%'
result in table scans since they have a leading wildcard. Typical btree
indexes can not help improve performance in this case.
Trigrams are 3-character slices of words:
SELECT show_trgm('Ruby');
-- {" r"," ru","by ",rub,uby}
They are useful as indexes since they can speed up LIKE
, ILIKE
, ~
, and ~*
clauses.
Let’s query a table of 14K email addresses:
CREATE TABLE emails (email TEXT);
\copy emails FROM 'emails.csv' DELIMITER ',' CSV;
EXPLAIN ANALYZE SELECT * FROM emails WHERE email ILIKE '%ion%';
The query plan shows a table scan:
Seq Scan on emails
Filter: (email ~~* '%ion%'::text)
Rows Removed by Filter: 14040
with an average execution time of 15ms
. Let’s create a trigram index and try again:
-- postgres 9.2+
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX emails_search_email_idx ON emails USING GIN (email gin_trgm_ops);
EXPLAIN ANALYZE SELECT * FROM emails WHERE email ILIKE '%ion%';
The improved query plan is:
Bitmap Heap Scan on emails
Recheck Cond: (email ~~* '%ion%'::text)
Heap Blocks: exact=98
-> Bitmap Index Scan on emails_search_email_idx
Index Cond: (email ~~* '%ion%'::text)
and averages 0.8ms
. A 19X improvement without updating any SQL queries! YMMV
Written on April 16, 2017 by jasonschweier