Data model feasibility with extensive array filtering

Hello, im currently in the process of selecting vector db for my use case:

tens of millions of dense + sparse vectors for hybrid search with extensive metadata filtering.

I expect to have primary int array field with ~300 numbers on average and a few other indexed scalar fields which would be used together.

I tried qdrant and it took more than a week on 32 core machine to index the data. Is this something weaviate could potentially do better? Is there any way to estimate the indexing/recall perf and memory usage?

hi @Fogapod !!

Welcome to our community :hugs:

We do have some calculations on how much of memory your vectors will be using:

https://docs.weaviate.io/weaviate/concepts/resources

However, for indexing, it will depend on the data type, replication and sharding, so I believe it would be better to do a test before :thinking:

So I believe that some experimentation is necessary in order to map out the possible changes you can do in order to improve performance.

Let me know if this helps!