Multi-collections or one collection and filter?

shadowlin · January 17, 2024, 12:57pm

If I have data with different companies with the same data schema. Should I use different collections for each company or I just use one collection and filter with company name or id.

The total number of companys could be from dozens to a few hundreds

DudaNogueira · January 17, 2024, 2:02pm

Hi there!

This is a classic Multi Tenancy scenario:

Check here all awesome CTO on a great talk about this subject:

Let me know if this helps

shadowlin · January 18, 2024, 7:45am

Thank you for point out this video for me.
But my situation is kinda different. To be able to isolate data by collection or tenant is a good way for my application but I also need the ability to search through all the collection or tenant.
I guess if I go with muliti collection or multi tenancy way I must perform the same query over all collection or tenant to be able to do a global search.This sounds like a expensive operation.

DudaNogueira · January 18, 2024, 11:22am

Right. Sorry, I was not aware of that requirement.

On that case, you could have a class and specify a field that will be used to filter out the data.

shadowlin · January 18, 2024, 12:31pm

@DudaNogueira
I am testing the all in one collection way.
I find the filter could impact the performance dramatically.
I use a INT field company_id and set it to filterable( I think this should build a invert index right?) and I set it into 1,2,3 evenly so if I filter company_id = 1 should filter to only 1/3 of my dataset.
I have a text field which uses openai to generate 1536d vectors.

I test with 10k 20k and 40k records
the vector only query time cost looks ok for 0.0028 0.0038 0.0041 sec
but the filter(for only use 1/3 of dataset) with vector query time cost looks bad for 0.0039 0.0061 0.0106 sec. It increase linearly.

shadowlin · January 18, 2024, 2:05pm

I found out it could be flat_search_cutoff setting cause the problem…

DudaNogueira · January 18, 2024, 7:47pm

Not sure I got it. Can you clarify?

Topic		Replies	Views
Query Multiple Data Sets Support	4	202	May 5, 2025
Issue regarding collections in weaviate Support python	3	254	March 3, 2025
Keyword, vector and hybrid searching cause less rows to be retrieved Support	4	432	February 13, 2024
Multiples clients General	1	412	December 25, 2023
Understanding on Multi-tenancy Support	4	659	April 24, 2024

Multi-collections or one collection and filter?

Related topics