Best Practice for Multi-Tenant Architecture with Shared and User-Specific Documents

Paul_Indsoft · August 20, 2025, 9:10am

Hi Weaviate community,

I’m designing a production system where I need to handle two types of documents:

Common documents - Shared across all clients/users
User-uploaded documents - Specific to each client/user

I’m considering the following architecture:

One collection without multi-tenancy for all common/shared documents
One collection with multi-tenancy for user-uploaded documents (one tenant per client/user)

My reasoning:

Common documents don’t need isolation and can benefit from being in a single collection
User documents need proper isolation and access control, which multi-tenancy provides
This separation might make querying and maintenance easier

Questions for the community:

Is this a recommended approach, or are there better patterns?
Are there any performance implications I should be aware of?
How would you handle cross-collection queries if a user needs to search both common and their own documents?
Any gotchas or lessons learned from similar implementations?

Thanks

Mohamed_Shahin · August 20, 2025, 10:14am

Good morning @Paul_Indsoft

Absolutely great question — let’s go into some details.

First of all, you can create as many tenants as you want. The only real limitation is the resources available to your DB. Weaviate can handle millions of tenants.

Collections vs. Tenants

Collections: Generally, it’s not recommended to go over ~1,000 single-tenant collections if a shared schema is given.
Tenants: Tenants live within a collection and are isolated (own shard, behave like their own collection) as you already know. If you’re using the same schema across multiple users or use cases, it’s much better to create one global collection and use tenants inside it. This approach is faster, more efficient, and easier to manage.

Avoid too many collections. Use tenants inside a shared collection whenever the schema is the same. Why avoid too many collections?

The GraphQL schema needs to be rebuilt every time you add/remove a collection or restart Weaviate. With many collections, this becomes slow.
You can also mitigate this by:
- You can set DISABLE_GRAPHQL, which disables GraphQL endpoints while keeping REST and gRPC operations.
- If you’re using GraphQL queries, they will fail once disabled. But REST/gRPC calls (and clients) will continue to work as expected.
- Weaviate Clients are already moving away from GraphQL to gRPC, so if you’re using the client exclusively, this won’t affect you and it’s faster in production via gRPC.
Large numbers of collections can slow down startup times (pods might take hours to be ready). Hence why, we also have implemented the following in just cases:
- Lazy loading shards.
- HNSW snapshots (introduced in 1.31)

Cross-references

Cross-references inside tenants are supported.
Cross-references from a single-tenant collection → multi-tenant collection (specific tenant) are not supported.

My recommendation here is to use multi-tenant collection for everything, then:

Create a tenant named e.g → common and reference from/to. So you do not have to have any single collections out of the MT collection.

In general, cross-references can impact performance, especially on large datasets. If you’re guaranteed small datasets and not complicated queries, performance is usually fine — and many customers of ours and developers use them successfully.

Summary

Your approach is solid.
Performance-wise, prefer tenants inside multi-tenant collections rather than creating too many collections.
Cross-references: supported within tenants and use a common tenant if you need to share data.
I hope the above explanation points out in main concerns.

Below is some content and snippets in my GitHub repo (personal projects) which is Python-based, with cross-reference example. Feel free to fork and adapt for your use case. It might be helpful to you.

Feel free to ask any questions, I hope it was comprehensive to you.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

Paul_Indsoft · August 20, 2025, 11:08am

Hi @Mohamed_Shahin , thank you for the detailed response! To confirm I understand correctly: you’re suggesting I create a single multi-tenant collection with tenants like “common” (for shared documents), “customer_1”, “customer_2”, etc. (for each customer’s documents). Is that the approach you’d recommend?

And if I need cross-references between documents, I should follow this same structure within the multi-tenant collection, correct?

Thanks

Mohamed_Shahin · August 20, 2025, 11:30am

Yes, exactly! Instead of having both an MT Collection and a separate Single Collection (for common data), you would only use the MT Collection. Inside it, you create tenants such as customer_1, customer_2, customer_3, and shared_documents. This way, you have four tenants—each isolated for customers 1, 2, and 3—but the shared one can be referenced by all three.

You’d design your schema with cross-references, similar to how relational databases use primary keys and foreign keys defining the relationship.

This approach is recommended, as it’s more efficient operationally. Cross-references work exactly the same way as with multiple single collections—the only difference is that you specify the tenant.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

Topic		Replies	Views
Multiples clients General	1	669	December 25, 2023
Multi tenancy dosen't help in our scenario when the number of collection reach 1000 Support python , technical	18	672	September 9, 2025
Managing Multi-Tenant Collections in Weaviate Support	3	833	June 11, 2024
Seeking Advice on Multi-tenancy Implementation for E-commerce Platform General	1	374	August 14, 2024
Understanding on Multi-tenancy Support	4	1090	April 24, 2024

Best Practice for Multi-Tenant Architecture with Shared and User-Specific Documents

Collections vs. Tenants

Cross-references

Summary

Related topics