Fast autocomplete in PostgreSQL with pg_trgm
Fast autocomplete in PostgreSQL with pg_trgm
Autocomplete functionality can significantly enhance UX, especially in comments or forums where users often mention each other by name or username. Implementing this is very efficient in PostgreSQL using the pg_trgm
extension, which allows for fast, fuzzy text searches. This post will guide you through setting up autocomplete for usernames in mentions using pg_trgm
.
Why pg_trgm?
The pg_trgm
(trigram) extension in PostgreSQL is designed for fast pattern matching. It breaks down strings into sequences of three consecutive characters (trigrams) and uses them to calculate similarity between strings. This makes it an ideal tool for implementing autocomplete, where you need to suggest usernames as users type.
Setting Up pg_trgm
First, ensure that the pg_trgm
extension is enabled in your PostgreSQL database:
create extension pg_trgm;
create extension citext; -- case-insensitive text columns
Create a simple users
table with unique usernames:
create table users (
id bigserial primary key,
username citext unique not null collate pg_catalog."C" -- case-insensitive, only ASCII.
);
Add the trigram index on the username
column:
create index users_username_trgm on users using gist (username gist_trgm_ops);
Now insert millions of rows into the users
table.
PostgreSQL and pg_trgm
is great for fuzzy search on small(ish) strings.
select
id,
username,
similarity(username, 'jaso') as score
from
users
where
username % 'jaso'
order by
score desc
limit
10
id | username | score
-----+-------------+------------
170883 | jason73 | 0.44444445
66533 | jason55 | 0.44444445
921950 | jason90 | 0.44444445
8853 | jasongraves | 0.30769232
I have a PG 12 database with 140,000,0000 users and this query takes 30ms.
Here’s an example query plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=20.07..20.10 rows=10 width=22) (actual time=0.386..0.389 rows=4 loops=1)
-> Sort (cost=20.07..20.10 rows=10 width=22) (actual time=0.384..0.386 rows=4 loops=1)
Sort Key: (similarity((username)::text, 'jaso'::text)) DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on user_ (cost=4.22..19.91 rows=10 width=22) (actual time=0.351..0.373 rows=4 loops=1)
Recheck Cond: ((username)::text % 'jaso'::text)
Heap Blocks: exact=4
-> Bitmap Index Scan on user_username_trgm (cost=0.00..4.22 rows=10 width=0) (actual time=0.336..0.337 rows=4 loops=1)
Index Cond: ((username)::text % 'jaso'::text)
Planning Time: 0.478 ms
Execution Time: 0.453 ms
So that’s …Pretty, pretty good