A Small DuckDB Embeddings Incident

DuckDB kept popping up in my feed, so I decided to try it for a fairly small project at work.

At first it was great. I had a dataset of around 400k rows, added some labels and scores, then added embeddings to make it semantically searchable and useful for a few classifiers. Clean, simple, civilized.

Then the project needed deeper analysis, so the dataset grew by roughly 10x and landed at around 5 million rows. That was where the civilized part ended. When it was time to build the embedding index in DuckDB, the server jammed itself into OOM territory, because building the HNSW index meant materializing a large graph structure in RAM on top of the raw vectors rather than letting DuckDB do its usual trick of spilling intermediate state to disk. It was not a huge machine, but up to that point memory had not looked like the bottleneck. The 10x was not part of the original brief, which is how these things usually happen. Fast, cheap, good, pick two.

DuckDB was still excellent for the relational and analytical parts. But the vector index just stopped behaving like a convenient thing.

Short version: DuckDB is fine for storing embeddings and doing brute-force or moderate-scale vector search, but once the vector index itself becomes large, the memory behavior becomes the constraint.

Why DuckDB gets memory intensive with embeddings

Part of the problem is simple enough: embeddings are big. The other part is that DuckDB’s vss extension uses an HNSW index, and DuckDB is fairly explicit about the catch:

  • the HNSW index must fit in RAM
  • the index is not buffer-managed
  • the size of the HNSW index in memory does not count toward DuckDB’s memory_limit
  • on restart, the persisted HNSW index is deserialized back into main memory in its entirety once accessed

So the usual DuckDB mental model, where the engine spills to disk and keeps things tidy, stops being the full story once the vector index walks in.

The first-order reason: raw embeddings are already expensive

If the embeddings are FLOAT32, the raw storage cost is:

rows * dimensions * 4 bytes

Examples:

Dims Raw vector memory at 5M rows
384 about 7.7 GB
768 about 15.4 GB
1536 about 30.7 GB

That is before adding:

  • the HNSW graph structure
  • build-time working memory
  • the rest of the table columns
  • normal query execution memory

So a smallish server can look perfectly fine while the table is growing, then fall over the moment the ANN index gets built.

Why HNSW makes it worse

HNSW is graph-based approximate nearest neighbor search. It is fast, but it trades memory for speed and recall.

Both DuckDB and FAISS expose the same basic tradeoff:

  • higher M means more neighbors per point and more memory
  • higher ef_construction means more work and more build cost
  • higher ef_search means better recall but more query work

So the failure mode is not especially mysterious:

  1. the vectors are already large
  2. the HNSW graph adds substantial overhead
  3. DuckDB does not fully protect you with memory_limit
  4. index creation becomes the point where the process gets OOM-killed

The short term solution: DuckDB + FAISS

Use DuckDB as the structured store and FAISS as the vector index.

DuckDB handles:

  • ids
  • metadata
  • filtering
  • labels
  • joins
  • aggregations
  • analytics

FAISS handles:

  • ANN indexing
  • high-scale nearest-neighbor search
  • compressed vector indexes
  • GPU acceleration if needed
  • more explicit memory/recall tradeoffs

The short-term fix:

  1. keep DuckDB as the main search database
  2. keep DuckDB FTS for keyword search
  3. move semantic retrieval into a separate FAISS artifact

The operating pattern mattered as much as the index choice:

  • sync new rows into DuckDB incrementally
  • embed only the rows still missing embeddings
  • rebuild search assets on a shadow copy
  • swap the finished build into place atomically

That bought two immediate advantages:

  • search could stay available on the old version while the new one was building
  • the vector index no longer had to live as a large DuckDB-managed ANN structure inside the main process

On the FAISS side the useful move was to pick a compressed index shape and build it without doing anything theatrical:

  • train on a subset of vectors
  • stream vectors into the index in chunks
  • keep a separate id mapping so FAISS returns candidates and DuckDB resolves the actual rows

So the short-term architecture was basically:

DuckDB for storage + BM25, FAISS for semantic retrieval, atomic rebuilds for operations

Why FAISS is a better fit

FAISS gives you more control over index type and memory footprint.

Useful options:

  • IndexFlatIP / IndexFlatL2: exact search, simple, but raw-vector memory
  • IndexHNSWFlat: similar graph approach, still memory-heavy
  • IndexIVFFlat: lower search cost, still stores full vectors
  • IndexScalarQuantizer: compress vectors
  • IndexPQ: much smaller vectors via product quantization
  • IndexIVFPQ: the usual scale-oriented compromise

That is the useful difference. With FAISS, you can move to compressed indexes when RAM becomes the problem instead of treating full-precision HNSW as destiny.

Practical architecture

Keep one stable numeric id per row in DuckDB.

Schema idea:

  • id
  • text
  • metadata columns
  • optional embedding blob/array if you still want it stored in DuckDB

Then:

  1. build a FAISS index over the embeddings using the same id
  2. search FAISS first to get top-k candidate ids
  3. pass those ids into DuckDB
  4. let DuckDB do filtering, joins, ranking, scoring, and result formatting

So the flow becomes:

query -> embedding -> FAISS top-k ids -> DuckDB fetch/filter/rerank

FAISS details that make this work

FAISS supports explicit ids:

  • wrap flat indexes with IndexIDMap if needed
  • IndexIVF* indexes store vector ids natively

That makes it pretty straightforward to keep DuckDB as the source of truth and FAISS as the retrieval layer.

Heuristic

Use DuckDB-only vector search when:

  • the dataset is still moderate
  • you want one-process simplicity
  • brute-force or a small ANN index is acceptable
  • analytics is the main job and vector search is secondary

Use DuckDB + FAISS when:

  • you are in the millions of rows
  • embedding dimensionality is high
  • memory is already tight
  • ANN is a core workload, not a side feature
  • you need compressed indexes or GPU options

Long term fit

If this stays a small custom project, DuckDB + FAISS is a good answer.

If it turns into a real product, I would treat DuckDB + FAISS as a strong transitional architecture, not the inevitable final form.

Why it is still good:

  • very fast to build
  • cheap to run
  • easy to understand
  • great when one machine can still hold the working set
  • good if data ingestion is mostly batchy and rebuild-oriented

Why it may stop being enough:

  • the FAISS index is still basically your responsibility to manage
  • filtering and vector retrieval live in separate systems you have to coordinate
  • high availability is not really built in
  • horizontal scaling is something you design yourself, not something the stack gives you
  • online updates, multi-tenant isolation, and operational failover become more work than they should be

This setup also has a filtering problem. With a setup like this, you often end up doing semantic search first and applying metadata filters afterward. That means you can end up with zero usable results even when rows matching the filters do exist, simply because the right candidates never made it into the first retrieval set. Systems like Qdrant have a cleaner story there, because filtering is part of the retrieval operation rather than a second pass after it.

You can blunt this a bit by over-fetching a larger candidate set before filtering, which is often good enough in practice. But that is still a workaround for the architecture, not the architecture being naturally good at filtered retrieval.

DuckDB + FAISS works. At some point, you may decide you do not want to keep owning the seams between storage, retrieval, filtering, and operations.

What to explore if this becomes a product

The most likely next step for this project would be Qdrant, mostly because we already use it and it is built for this job.

Option Why it comes up Where it looks strongest
Qdrant purpose-built for vector search, dedicated retrieval system, payload filtering, native BM25, dense+sparse hybrid queries, clustering/sharding/replication the most likely next step here, especially if semantic retrieval is turning into real infrastructure
Postgres + pgvector one database for metadata, filters, and vector search, supports HNSW and IVFFlat, more natural as a long-lived application database normal SaaS product with app data, transactions, online writes, and moderate hybrid search
Vespa strong hybrid retrieval model, nearestNeighbor plus lexical ranking, better story for ranking pipelines search-heavy product where lexical, vector, metadata, and business logic all need to be combined seriously
Milvus built for horizontal scaling, cloud-native architecture, aimed at large vector workloads genuinely large-scale vector infrastructure where distributed ops are already the problem

Version markers for the docs I checked on 2026-05-25:

  • Qdrant: unversioned official docs, retrieved 2026-05-25
  • Postgres + pgvector: PostgreSQL 18 and pgvector v0.8.2
  • Vespa: unversioned official docs, retrieved 2026-05-25
  • Milvus: v2.6.x docs

Sources

  • DuckDB vss extension docs: https://duckdb.org/docs/current/core_extensions/vss
  • DuckDB indexing docs: https://duckdb.org/docs/current/guides/performance/indexing
  • DuckDB out-of-memory guide: https://duckdb.org/docs/current/guides/performance/oom
  • DuckDB VSS announcement: https://duckdb.org/2024/05/03/vector-similarity-search-vss
  • FAISS wiki overview: https://github.com/facebookresearch/faiss/wiki
  • FAISS index types: https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
  • FAISS id mapping: https://github.com/facebookresearch/faiss/wiki/Pre–and-post-processing
  • PostgreSQL 18 full-text search docs: https://www.postgresql.org/docs/18/textsearch.html
  • PostgreSQL 18 text search functions and ranking: https://www.postgresql.org/docs/18/functions-textsearch.html
  • pgvector v0.8.2 README, including hybrid search: https://github.com/pgvector/pgvector/tree/v0.8.2
  • Qdrant docs site, unversioned pages checked 2026-05-25: https://qdrant.tech/documentation/search/text-search/
  • Qdrant hybrid queries docs, unversioned pages checked 2026-05-25: https://qdrant.tech/documentation/search/hybrid-queries/
  • Qdrant distributed deployment docs, unversioned pages checked 2026-05-25: https://qdrant.tech/documentation/operations/distributed_deployment/
  • Vespa docs site, unversioned pages checked 2026-05-25: BM25 rank feature https://docs.vespa.ai/en/ranking/bm25.html
  • Vespa docs site, unversioned pages checked 2026-05-25: hybrid search tutorial https://docs.vespa.ai/en/learn/tutorials/hybrid-search
  • Milvus v2.6.x BM25 docs: https://milvus.io/docs/bm25-function.md
  • Milvus v2.6.x hybrid search patterns: https://milvus.io/docs/v2.6.x/search_patterns.md
  • Milvus architecture overview: https://milvus.io/docs/architecture_overview.md