Multi-collections or one collection and filter?

If I have data with different companies with the same data schema. Should I use different collections for each company or I just use one collection and filter with company name or id.

The total number of companys could be from dozens to a few hundreds

Hi there!

This is a classic Multi Tenancy scenario:

Check here all awesome CTO on a great talk about this subject:

Let me know if this helps :slight_smile:

Thank you for point out this video for me.
But my situation is kinda different. To be able to isolate data by collection or tenant is a good way for my application but I also need the ability to search through all the collection or tenant.
I guess if I go with muliti collection or multi tenancy way I must perform the same query over all collection or tenant to be able to do a global search.This sounds like a expensive operation.

Right. Sorry, I was not aware of that requirement.

On that case, you could have a class and specify a field that will be used to filter out the data.

I am testing the all in one collection way.
I find the filter could impact the performance dramatically.
I use a INT field company_id and set it to filterable( I think this should build a invert index right?) and I set it into 1,2,3 evenly so if I filter company_id = 1 should filter to only 1/3 of my dataset.
I have a text field which uses openai to generate 1536d vectors.

I test with 10k 20k and 40k records
the vector only query time cost looks ok for 0.0028 0.0038 0.0041 sec
but the filter(for only use 1/3 of dataset) with vector query time cost looks bad for 0.0039 0.0061 0.0106 sec. It increase linearly.

I found out it could be flat_search_cutoff setting cause the problem…

Not sure I got it. Can you clarify?