HNSW performance with filters

agoriwmt · August 22, 2025, 7:02am

Performance of HNSW in Weaviate when filtering highly partitioned data.

Performance of HNSW in Weaviate with Highly Partitioned Data

I’m evaluating Weaviate for vector and hybrid search over our dataset. The data is highly partitioned, each user only has access to a small slice of the database. This filtering must be applied dynamically via filters, so I can’t rely on static or semi-static techniques like separate collections.

I’ve read that HNSW performance can degrade in such scenarios, since the algorithm may struggle to efficiently navigate when filters exclude large portions of the graph. To test this, I set up an experiment:

Setup:
- Created a collection with a filterable property and a vectorized field.
- Built a dataset with queries, relevance scores, and a function to compute evaluation metrics.
Experiment 1:
- Small dataset (~400 objects), all stored under a specific property (let’s call it A).
- Ran the evaluation and obtained baseline scores.
Experiment 2:
- Extended the dataset to ~2000 objects by adding variations of the original items.
- These new objects were stored under different properties (B, C, etc.).
- Re-ran the evaluation, expecting performance degradation due to filtering overhead.

To my surprise, the results were identical to the smaller dataset.

How is this possible?
Should I keep scaling the dataset further to observe metric degradation?
Or does Weaviate’s HNSW implementation include optimizations that mitigate the filtering performance issues typically associated with partitioned data?

DudaNogueira · August 22, 2025, 12:54pm

hi @agoriwmt !!

Welcome to our community

I believe that 400 and 2000 are too small of a dataset to start making significant difference.

Also, it is interesting to note about ACORN: How we speed up filtered vector search with ACORN | Weaviate

agoriwmt · August 29, 2025, 3:15pm

I tried to expand the dataset to 30,000 and beyond but still can’t see any difference in my metrics. Weird.

Topic		Replies	Views
How to planning HNSW index ef, efConstruction and maxConnections parameters with PQ? Support technical	1	501	January 6, 2025
Distances with Flat Index worse than HNSW Index. And durations with Flat Index is better than HNSW Support	4	625	February 8, 2024
Performance wise suggestion General developer-experience , python	0	338	May 28, 2024
Search with Filter takes longer Support developer-experience , python , technical	1	346	February 3, 2025
Is pre-filtering not supported for hybrid search? Support python	3	492	April 22, 2025

HNSW performance with filters

Performance of HNSW in Weaviate with Highly Partitioned Data

Related topics