A Small DuckDB Embeddings Incident
DuckDB kept popping up in my feed, so I decided to try it for a fairly small project at work.
At first it was great. I had a dataset of around 400k rows, added some labels and scores, then added embeddings to make it semantically searchable and useful for a few classifiers. Clean, simple, civilized.
Then the project needed deeper analysis, so the dataset grew by roughly 10x and landed at around 5 million rows. That was where the civilized part ended. When it was time to build the embedding index in DuckDB, the server jammed itself into OOM territory, because building the HNSW index meant materializing a large graph structure in RAM on top of the raw vectors rather than letting DuckDB do its usual trick of spilling intermediate state to disk. It was not a huge machine, but up to that point memory had not looked like the bottleneck. The 10x was not part of the original brief, which is how these things usually happen. Fast, cheap, good, pick two.
DuckDB was still excellent for the relational and analytical parts. But the vector index just stopped behaving like a convenient thing.
Short version: DuckDB is fine for storing embeddings and doing brute-force or moderate-scale vector search, but once the vector index itself becomes large, the memory behavior becomes the constraint.
Why DuckDB gets memory intensive with embeddings
Part of the problem is simple enough: embeddings are big. The other part is that DuckDB’s vss extension uses an HNSW index, and DuckDB is fairly explicit about the catch:
- the
HNSWindex must fit in RAM - the index is not buffer-managed
- the size of the
HNSWindex in memory does not count toward DuckDB’smemory_limit - on restart, the persisted
HNSWindex is deserialized back into main memory in its entirety once accessed
So the usual DuckDB mental model, where the engine spills to disk and keeps things tidy, stops being the full story once the vector index walks in.
The first-order reason: raw embeddings are already expensive
If the embeddings are FLOAT32, the raw storage cost is:
rows * dimensions * 4 bytes
Examples:
| Dims | Raw vector memory at 5M rows |
|---|---|
384 |
about 7.7 GB |
768 |
about 15.4 GB |
1536 |
about 30.7 GB |
That is before adding:
- the
HNSWgraph structure - build-time working memory
- the rest of the table columns
- normal query execution memory
So a smallish server can look perfectly fine while the table is growing, then fall over the moment the ANN index gets built.
Why HNSW makes it worse
HNSW is graph-based approximate nearest neighbor search. It is fast, but it trades memory for speed and recall.
Both DuckDB and FAISS expose the same basic tradeoff:
- higher
Mmeans more neighbors per point and more memory - higher
ef_constructionmeans more work and more build cost - higher
ef_searchmeans better recall but more query work
So the failure mode is not especially mysterious:
- the vectors are already large
- the HNSW graph adds substantial overhead
- DuckDB does not fully protect you with
memory_limit - index creation becomes the point where the process gets OOM-killed
The short term solution: DuckDB + FAISS
Use DuckDB as the structured store and FAISS as the vector index.
DuckDB handles:
- ids
- metadata
- filtering
- labels
- joins
- aggregations
- analytics
FAISS handles:
- ANN indexing
- high-scale nearest-neighbor search
- compressed vector indexes
- GPU acceleration if needed
- more explicit memory/recall tradeoffs
The short-term fix:
- keep DuckDB as the main search database
- keep DuckDB FTS for keyword search
- move semantic retrieval into a separate FAISS artifact
The operating pattern mattered as much as the index choice:
- sync new rows into DuckDB incrementally
- embed only the rows still missing embeddings
- rebuild search assets on a shadow copy
- swap the finished build into place atomically
That bought two immediate advantages:
- search could stay available on the old version while the new one was building
- the vector index no longer had to live as a large DuckDB-managed ANN structure inside the main process
On the FAISS side the useful move was to pick a compressed index shape and build it without doing anything theatrical:
- train on a subset of vectors
- stream vectors into the index in chunks
- keep a separate id mapping so FAISS returns candidates and DuckDB resolves the actual rows
So the short-term architecture was basically:
DuckDB for storage + BM25, FAISS for semantic retrieval, atomic rebuilds for operations
Why FAISS is a better fit
FAISS gives you more control over index type and memory footprint.
Useful options:
IndexFlatIP/IndexFlatL2: exact search, simple, but raw-vector memoryIndexHNSWFlat: similar graph approach, still memory-heavyIndexIVFFlat: lower search cost, still stores full vectorsIndexScalarQuantizer: compress vectorsIndexPQ: much smaller vectors via product quantizationIndexIVFPQ: the usual scale-oriented compromise
That is the useful difference. With FAISS, you can move to compressed indexes when RAM becomes the problem instead of treating full-precision HNSW as destiny.
Practical architecture
Keep one stable numeric id per row in DuckDB.
Schema idea:
idtextmetadata columns- optional embedding blob/array if you still want it stored in DuckDB
Then:
- build a FAISS index over the embeddings using the same
id - search FAISS first to get top-k candidate ids
- pass those ids into DuckDB
- let DuckDB do filtering, joins, ranking, scoring, and result formatting
So the flow becomes:
query -> embedding -> FAISS top-k ids -> DuckDB fetch/filter/rerank
FAISS details that make this work
FAISS supports explicit ids:
- wrap flat indexes with
IndexIDMapif needed IndexIVF*indexes store vector ids natively
That makes it pretty straightforward to keep DuckDB as the source of truth and FAISS as the retrieval layer.
Heuristic
Use DuckDB-only vector search when:
- the dataset is still moderate
- you want one-process simplicity
- brute-force or a small ANN index is acceptable
- analytics is the main job and vector search is secondary
Use DuckDB + FAISS when:
- you are in the millions of rows
- embedding dimensionality is high
- memory is already tight
- ANN is a core workload, not a side feature
- you need compressed indexes or GPU options
Long term fit
If this stays a small custom project, DuckDB + FAISS is a good answer.
If it turns into a real product, I would treat DuckDB + FAISS as a strong transitional architecture, not the inevitable final form.
Why it is still good:
- very fast to build
- cheap to run
- easy to understand
- great when one machine can still hold the working set
- good if data ingestion is mostly batchy and rebuild-oriented
Why it may stop being enough:
- the FAISS index is still basically your responsibility to manage
- filtering and vector retrieval live in separate systems you have to coordinate
- high availability is not really built in
- horizontal scaling is something you design yourself, not something the stack gives you
- online updates, multi-tenant isolation, and operational failover become more work than they should be
This setup also has a filtering problem. With a setup like this, you often end up doing semantic search first and applying metadata filters afterward. That means you can end up with zero usable results even when rows matching the filters do exist, simply because the right candidates never made it into the first retrieval set. Systems like Qdrant have a cleaner story there, because filtering is part of the retrieval operation rather than a second pass after it.
You can blunt this a bit by over-fetching a larger candidate set before filtering, which is often good enough in practice. But that is still a workaround for the architecture, not the architecture being naturally good at filtered retrieval.
DuckDB + FAISS works. At some point, you may decide you do not want to keep owning the seams between storage, retrieval, filtering, and operations.
What to explore if this becomes a product
The most likely next step for this project would be Qdrant, mostly because we already use it and it is built for this job.
| Option | Why it comes up | Where it looks strongest |
|---|---|---|
Qdrant |
purpose-built for vector search, dedicated retrieval system, payload filtering, native BM25, dense+sparse hybrid queries, clustering/sharding/replication |
the most likely next step here, especially if semantic retrieval is turning into real infrastructure |
Postgres + pgvector |
one database for metadata, filters, and vector search, supports HNSW and IVFFlat, more natural as a long-lived application database |
normal SaaS product with app data, transactions, online writes, and moderate hybrid search |
Vespa |
strong hybrid retrieval model, nearestNeighbor plus lexical ranking, better story for ranking pipelines |
search-heavy product where lexical, vector, metadata, and business logic all need to be combined seriously |
Milvus |
built for horizontal scaling, cloud-native architecture, aimed at large vector workloads | genuinely large-scale vector infrastructure where distributed ops are already the problem |
Version markers for the docs I checked on 2026-05-25:
Qdrant: unversioned official docs, retrieved2026-05-25Postgres + pgvector:PostgreSQL 18andpgvector v0.8.2Vespa: unversioned official docs, retrieved2026-05-25Milvus:v2.6.xdocs
Sources
- DuckDB
vssextension docs: https://duckdb.org/docs/current/core_extensions/vss - DuckDB indexing docs: https://duckdb.org/docs/current/guides/performance/indexing
- DuckDB out-of-memory guide: https://duckdb.org/docs/current/guides/performance/oom
- DuckDB VSS announcement: https://duckdb.org/2024/05/03/vector-similarity-search-vss
- FAISS wiki overview: https://github.com/facebookresearch/faiss/wiki
- FAISS index types: https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
- FAISS id mapping: https://github.com/facebookresearch/faiss/wiki/Pre–and-post-processing
- PostgreSQL
18full-text search docs: https://www.postgresql.org/docs/18/textsearch.html - PostgreSQL
18text search functions and ranking: https://www.postgresql.org/docs/18/functions-textsearch.html pgvector v0.8.2README, including hybrid search: https://github.com/pgvector/pgvector/tree/v0.8.2- Qdrant docs site, unversioned pages checked
2026-05-25: https://qdrant.tech/documentation/search/text-search/ - Qdrant hybrid queries docs, unversioned pages checked
2026-05-25: https://qdrant.tech/documentation/search/hybrid-queries/ - Qdrant distributed deployment docs, unversioned pages checked
2026-05-25: https://qdrant.tech/documentation/operations/distributed_deployment/ - Vespa docs site, unversioned pages checked
2026-05-25: BM25 rank feature https://docs.vespa.ai/en/ranking/bm25.html - Vespa docs site, unversioned pages checked
2026-05-25: hybrid search tutorial https://docs.vespa.ai/en/learn/tutorials/hybrid-search - Milvus
v2.6.xBM25 docs: https://milvus.io/docs/bm25-function.md - Milvus
v2.6.xhybrid search patterns: https://milvus.io/docs/v2.6.x/search_patterns.md - Milvus architecture overview: https://milvus.io/docs/architecture_overview.md