Absolutely great question — let’s go into some details.
First of all, you can create as many tenants as you want. The only real limitation is the resources available to your DB. Weaviate can handle millions of tenants.
Collections vs. Tenants
Collections: Generally, it’s not recommended to go over ~1,000 single-tenant collections if a shared schema is given.
Tenants: Tenants live within a collection and are isolated (own shard, behave like their own collection) as you already know. If you’re using the same schema across multiple users or use cases, it’s much better to create one global collection and use tenants inside it. This approach is faster, more efficient, and easier to manage.
Avoid too many collections. Use tenants inside a shared collection whenever the schema is the same. Why avoid too many collections?
The GraphQL schema needs to be rebuilt every time you add/remove a collection or restart Weaviate. With many collections, this becomes slow.
You can also mitigate this by:
You can set DISABLE_GRAPHQL, which disables GraphQL endpoints while keeping REST and gRPC operations.
If you’re using GraphQL queries, they will fail once disabled. But REST/gRPC calls (and clients) will continue to work as expected.
Weaviate Clients are already moving away from GraphQL to gRPC, so if you’re using the client exclusively, this won’t affect you and it’s faster in production via gRPC.
Large numbers of collections can slow down startup times (pods might take hours to be ready). Hence why, we also have implemented the following in just cases:
Lazy loading shards.
HNSW snapshots (introduced in 1.31)
Cross-references
Cross-references inside tenants are supported.
Cross-references from a single-tenant collection → multi-tenant collection (specific tenant) are not supported.
My recommendation here is to use multi-tenant collection for everything, then:
Create a tenant named e.g → common and reference from/to. So you do not have to have any single collections out of the MT collection.
In general, cross-references can impact performance, especially on large datasets. If you’re guaranteed small datasets and not complicated queries, performance is usually fine — and many customers of ours and developers use them successfully.
Summary
Your approach is solid.
Performance-wise, prefer tenants inside multi-tenant collections rather than creating too many collections.
Cross-references: supported within tenants and use a common tenant if you need to share data.
I hope the above explanation points out in main concerns.
Below is some content and snippets in my GitHub repo (personal projects) which is Python-based, with cross-reference example. Feel free to fork and adapt for your use case. It might be helpful to you.
Feel free to ask any questions, I hope it was comprehensive to you.
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)
Hi @Mohamed_Shahin , thank you for the detailed response! To confirm I understand correctly: you’re suggesting I create a single multi-tenant collection with tenants like “common” (for shared documents), “customer_1”, “customer_2”, etc. (for each customer’s documents). Is that the approach you’d recommend?
And if I need cross-references between documents, I should follow this same structure within the multi-tenant collection, correct?
Yes, exactly! Instead of having both an MT Collection and a separate Single Collection (for common data), you would only use the MT Collection. Inside it, you create tenants such as customer_1, customer_2, customer_3, and shared_documents. This way, you have four tenants—each isolated for customers 1, 2, and 3—but the shared one can be referenced by all three.
You’d design your schema with cross-references, similar to how relational databases use primary keys and foreign keys defining the relationship.
This approach is recommended, as it’s more efficient operationally. Cross-references work exactly the same way as with multiple single collections—the only difference is that you specify the tenant.
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)