4 min read Emadideen Ghannam

Postgres + Elasticsearch: when to add a search index

Postgres full-text search is good enough for longer than most teams admit. Here is the threshold I use to decide when Elasticsearch earns its keep.

Adding Elasticsearch to a stack is a one-way door. New infra, new failure mode, new sync pipeline, new operational surface. I have shipped systems with both Postgres-only search and Postgres + Elasticsearch, and the answer to “should we add it” is almost always “not yet.”

Here is the threshold I actually use.

What Postgres can do

Postgres full-text search has more capability than most teams use. You get:

  • Tokenisation and stemming via to_tsvector.
  • Ranked results via ts_rank and ts_rank_cd.
  • Multi-language dictionaries.
  • Trigram fuzzy matching via pg_trgm and similarity().
  • Phrase search via <-> operator.
  • GIN indexes that scale to a few million documents on commodity hardware.

For most product search problems - find a customer by name, find a product by description, find an order by reference - Postgres is enough. Not “barely enough.” Genuinely enough. With proper indexes and a tsvector column, queries return in under 50ms on a 5 million row table.

ALTER TABLE customers ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('simple', coalesce(name, '') || ' ' || coalesce(email, ''))) STORED;

CREATE INDEX customers_search_idx ON customers USING gin(search_vector);

SELECT id, name FROM customers
WHERE search_vector @@ plainto_tsquery('simple', 'jane doe')
ORDER BY ts_rank(search_vector, plainto_tsquery('simple', 'jane doe')) DESC
LIMIT 20;

That is a real working search box. No new infra.

When Postgres stops being enough

The threshold I look for, in order:

1. Multilingual stemming where Postgres dictionaries don’t exist. Postgres has good dictionaries for English, Spanish, French, German, and a handful of others. It does not handle Arabic, Hebrew, Persian, Thai, or Japanese well out of the box. If your product is multilingual in a language Postgres does not stem, you need a real analyzer. That is the cleanest reason to add Elasticsearch.

2. Faceted aggregations on filtered search. “Show me all listings tagged X in city Y between price A and B, faceted by category, all in one round trip.” Postgres can do this with multiple queries and care, but Elasticsearch’s aggregation API is built for it. If your UI needs ten facets that update with every keystroke, Elasticsearch’s response time at p95 will beat Postgres at the same scale.

3. Documents in the tens of millions with sub-100ms p95 search. Postgres GIN indexes hold up well to 5-10 million rows. Beyond that, query plans get fragile and you start fighting auto-vacuum on the tsvector column. Elasticsearch is built to spread that load across shards.

4. Synonyms, custom analyzers, multi-field weighting. “Search the title with 3x weight, the description with 1x, the tags with 2x, expand ‘tv’ to ‘television’ and ‘apartment’ to ‘flat’.” You can express this in Postgres with multiple tsvector columns and weighted setweight() calls, but it gets ugly. Elasticsearch’s analyzer chain is the right tool.

If none of those apply, do not add Elasticsearch.

What “adding” Elasticsearch actually costs

Three real costs, often underestimated.

Sync pipeline. Postgres is the source of truth. Elasticsearch is the index. They have to stay in sync. You can use logical replication (Debezium + Kafka), CDC (wal2json -> consumer), or application-level dual-writes (commit to both in a transaction). Each has failure modes. The hardest bug I have ever debugged in a production search system was a 4-row gap between Postgres and Elasticsearch caused by a Debezium offset race.

Schema drift. When the Postgres schema changes, the Elasticsearch mapping has to follow. There is no FK to keep them honest. Most teams forget this and discover six months later that a new column is invisible to search.

Re-indexing. Mappings cannot be retroactively changed for some field types. To change a field from text to keyword you re-index the whole corpus. Plan for re-indexes. Make them cheap. If a re-index of your full corpus takes two days, you are one bad mapping away from a two-day outage.

A pattern that works

For a directory platform I built, I picked Postgres-only search until the corpus crossed about 200,000 listings and we needed Arabic stemming. Then I added Elasticsearch as the search layer with Postgres as source of truth.

The pattern:

  • Every write goes to Postgres in a transaction.
  • A consumer reads the WAL via Debezium and publishes to a queue.
  • A small worker indexes documents into Elasticsearch.
  • The search API queries Elasticsearch first, falls back to Postgres on failure.

The fallback is the safety net. If Elasticsearch goes down, search degrades; the product doesn’t.

Adding Elasticsearch took two weeks of focused work. Removing it would take six months. Pick the moment carefully.


Edit on GitHub (opens in new tab)