Performance of HNSW in Weaviate when filtering highly partitioned data.
Performance of HNSW in Weaviate with Highly Partitioned Data
I’m evaluating Weaviate for vector and hybrid search over our dataset. The data is highly partitioned, each user only has access to a small slice of the database. This filtering must be applied dynamically via filters, so I can’t rely on static or semi-static techniques like separate collections.
I’ve read that HNSW performance can degrade in such scenarios, since the algorithm may struggle to efficiently navigate when filters exclude large portions of the graph. To test this, I set up an experiment:
-
Setup:
-
Created a collection with a filterable property and a vectorized field.
-
Built a dataset with queries, relevance scores, and a function to compute evaluation metrics.
-
-
Experiment 1:
-
Small dataset (~400 objects), all stored under a specific property (let’s call it
A). -
Ran the evaluation and obtained baseline scores.
-
-
Experiment 2:
-
Extended the dataset to ~2000 objects by adding variations of the original items.
-
These new objects were stored under different properties (
B,C, etc.). -
Re-ran the evaluation, expecting performance degradation due to filtering overhead.
-
To my surprise, the results were identical to the smaller dataset.
How is this possible?
Should I keep scaling the dataset further to observe metric degradation?
Or does Weaviate’s HNSW implementation include optimizations that mitigate the filtering performance issues typically associated with partitioned data?