A Small DuckDB Embeddings Incident

DuckDB kept popping up in my feed, so I decided to try it for a fairly small project at work.

At first it was great. I had a dataset of around 400k rows, added some labels and scores, then added embeddings to make it semantically searchable and useful for a few classifiers. Clean, simple, civilized.

Then the project needed deeper analysis, so the dataset grew by roughly 10x and landed at around 5 million rows. That was where the civilized part ended. When it was time to build the embedding index in DuckDB, the server jammed itself into OOM territory, because building the HNSW index meant materializing a large graph structure in RAM on top of the raw vectors rather than letting DuckDB do its usual trick of spilling intermediate state to disk. It was not a huge machine, but up to that point memory had not looked like the bottleneck. The 10x was not part of the original brief, which is how these things usually happen. Fast, cheap, good, pick two.

DuckDB was still excellent for the relational and analytical parts. But the vector index just stopped behaving like a convenient thing.

Short version: DuckDB is fine for storing embeddings and doing brute-force or moderate-scale vector search, but once the vector index itself becomes large, the memory behavior becomes the constraint.

Why DuckDB gets memory intensive with embeddings

Part of the problem is simple enough: embeddings are big. The other part is that DuckDB’s vss extension uses an HNSW index, and DuckDB is fairly explicit about the catch:

the HNSW index must fit in RAM
the index is not buffer-managed
the size of the HNSW index in memory does not count toward DuckDB’s memory_limit
on restart, the persisted HNSW index is deserialized back into main memory in its entirety once accessed

So the usual DuckDB mental model, where the engine spills to disk and keeps things tidy, stops being the full story once the vector index walks in.

The first-order reason: raw embeddings are already expensive

If the embeddings are FLOAT32, the raw storage cost is:

rows * dimensions * 4 bytes

Examples:

Dims	Raw vector memory at 5M rows
`384`	about `7.7 GB`
`768`	about `15.4 GB`
`1536`	about `30.7 GB`

That is before adding:

the HNSW graph structure
build-time working memory
the rest of the table columns
normal query execution memory

So a smallish server can look perfectly fine while the table is growing, then fall over the moment the ANN index gets built.

Why HNSW makes it worse

HNSW is graph-based approximate nearest neighbor search. It is fast, but it trades memory for speed and recall.

Both DuckDB and FAISS expose the same basic tradeoff:

higher M means more neighbors per point and more memory
higher ef_construction means more work and more build cost
higher ef_search means better recall but more query work

So the failure mode is not especially mysterious:

the vectors are already large
the HNSW graph adds substantial overhead
DuckDB does not fully protect you with memory_limit
index creation becomes the point where the process gets OOM-killed

The short term solution: DuckDB + FAISS

Use DuckDB as the structured store and FAISS as the vector index.

DuckDB handles:

ids
metadata
filtering
labels
joins
aggregations
analytics

FAISS handles:

ANN indexing
high-scale nearest-neighbor search
compressed vector indexes
GPU acceleration if needed
more explicit memory/recall tradeoffs

The short-term fix:

keep DuckDB as the main search database
keep DuckDB FTS for keyword search
move semantic retrieval into a separate FAISS artifact

The operating pattern mattered as much as the index choice:

sync new rows into DuckDB incrementally
embed only the rows still missing embeddings
rebuild search assets on a shadow copy
swap the finished build into place atomically

That bought two immediate advantages:

search could stay available on the old version while the new one was building
the vector index no longer had to live as a large DuckDB-managed ANN structure inside the main process

On the FAISS side the useful move was to pick a compressed index shape and build it without doing anything theatrical:

train on a subset of vectors
stream vectors into the index in chunks
keep a separate id mapping so FAISS returns candidates and DuckDB resolves the actual rows

So the short-term architecture was basically:

DuckDB for storage + BM25, FAISS for semantic retrieval, atomic rebuilds for operations

Why FAISS is a better fit

FAISS gives you more control over index type and memory footprint.

Useful options:

IndexFlatIP / IndexFlatL2: exact search, simple, but raw-vector memory
IndexHNSWFlat: similar graph approach, still memory-heavy
IndexIVFFlat: lower search cost, still stores full vectors
IndexScalarQuantizer: compress vectors
IndexPQ: much smaller vectors via product quantization
IndexIVFPQ: the usual scale-oriented compromise

That is the useful difference. With FAISS, you can move to compressed indexes when RAM becomes the problem instead of treating full-precision HNSW as destiny.

Practical architecture

Keep one stable numeric id per row in DuckDB.

Schema idea:

id
text
metadata columns
optional embedding blob/array if you still want it stored in DuckDB

Then:

build a FAISS index over the embeddings using the same id
search FAISS first to get top-k candidate ids
pass those ids into DuckDB
let DuckDB do filtering, joins, ranking, scoring, and result formatting

So the flow becomes:

query -> embedding -> FAISS top-k ids -> DuckDB fetch/filter/rerank

FAISS details that make this work

FAISS supports explicit ids:

wrap flat indexes with IndexIDMap if needed
IndexIVF* indexes store vector ids natively

That makes it pretty straightforward to keep DuckDB as the source of truth and FAISS as the retrieval layer.

Heuristic

Use DuckDB-only vector search when:

the dataset is still moderate
you want one-process simplicity
brute-force or a small ANN index is acceptable
analytics is the main job and vector search is secondary

Use DuckDB + FAISS when:

you are in the millions of rows
embedding dimensionality is high
memory is already tight
ANN is a core workload, not a side feature
you need compressed indexes or GPU options

Long term fit

If this stays a small custom project, DuckDB + FAISS is a good answer.

If it turns into a real product, I would treat DuckDB + FAISS as a strong transitional architecture, not the inevitable final form.

Why it is still good:

very fast to build
cheap to run
easy to understand
great when one machine can still hold the working set
good if data ingestion is mostly batchy and rebuild-oriented

Why it may stop being enough:

the FAISS index is still basically your responsibility to manage
filtering and vector retrieval live in separate systems you have to coordinate
high availability is not really built in
horizontal scaling is something you design yourself, not something the stack gives you
online updates, multi-tenant isolation, and operational failover become more work than they should be

This setup also has a filtering problem. With a setup like this, you often end up doing semantic search first and applying metadata filters afterward. That means you can end up with zero usable results even when rows matching the filters do exist, simply because the right candidates never made it into the first retrieval set. Systems like Qdrant have a cleaner story there, because filtering is part of the retrieval operation rather than a second pass after it.

You can blunt this a bit by over-fetching a larger candidate set before filtering, which is often good enough in practice. But that is still a workaround for the architecture, not the architecture being naturally good at filtered retrieval.

DuckDB + FAISS works. At some point, you may decide you do not want to keep owning the seams between storage, retrieval, filtering, and operations.

What to explore if this becomes a product

The most likely next step for this project would be Qdrant, mostly because we already use it and it is built for this job.

Option	Why it comes up	Where it looks strongest
`Qdrant`	purpose-built for vector search, dedicated retrieval system, payload filtering, native `BM25`, dense+sparse hybrid queries, clustering/sharding/replication	the most likely next step here, especially if semantic retrieval is turning into real infrastructure
`Postgres + pgvector`	one database for metadata, filters, and vector search, supports `HNSW` and `IVFFlat`, more natural as a long-lived application database	normal SaaS product with app data, transactions, online writes, and moderate hybrid search
`Vespa`	strong hybrid retrieval model, `nearestNeighbor` plus lexical ranking, better story for ranking pipelines	search-heavy product where lexical, vector, metadata, and business logic all need to be combined seriously
`Milvus`	built for horizontal scaling, cloud-native architecture, aimed at large vector workloads	genuinely large-scale vector infrastructure where distributed ops are already the problem

Version markers for the docs I checked on 2026-05-25:

Qdrant: unversioned official docs, retrieved 2026-05-25
Postgres + pgvector: PostgreSQL 18 and pgvector v0.8.2
Vespa: unversioned official docs, retrieved 2026-05-25
Milvus: v2.6.x docs

Sources

DuckDB vss extension docs: https://duckdb.org/docs/current/core_extensions/vss
DuckDB indexing docs: https://duckdb.org/docs/current/guides/performance/indexing
DuckDB out-of-memory guide: https://duckdb.org/docs/current/guides/performance/oom
DuckDB VSS announcement: https://duckdb.org/2024/05/03/vector-similarity-search-vss
FAISS wiki overview: https://github.com/facebookresearch/faiss/wiki
FAISS index types: https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
FAISS id mapping: https://github.com/facebookresearch/faiss/wiki/Pre–and-post-processing
PostgreSQL 18 full-text search docs: https://www.postgresql.org/docs/18/textsearch.html
PostgreSQL 18 text search functions and ranking: https://www.postgresql.org/docs/18/functions-textsearch.html
pgvector v0.8.2 README, including hybrid search: https://github.com/pgvector/pgvector/tree/v0.8.2
Qdrant docs site, unversioned pages checked 2026-05-25: https://qdrant.tech/documentation/search/text-search/
Qdrant hybrid queries docs, unversioned pages checked 2026-05-25: https://qdrant.tech/documentation/search/hybrid-queries/
Qdrant distributed deployment docs, unversioned pages checked 2026-05-25: https://qdrant.tech/documentation/operations/distributed_deployment/
Vespa docs site, unversioned pages checked 2026-05-25: BM25 rank feature https://docs.vespa.ai/en/ranking/bm25.html
Vespa docs site, unversioned pages checked 2026-05-25: hybrid search tutorial https://docs.vespa.ai/en/learn/tutorials/hybrid-search
Milvus v2.6.x BM25 docs: https://milvus.io/docs/bm25-function.md
Milvus v2.6.x hybrid search patterns: https://milvus.io/docs/v2.6.x/search_patterns.md
Milvus architecture overview: https://milvus.io/docs/architecture_overview.md