Multi tenancy dosen't help in our scenario when the number of collection reach 1000

Description

We’re building a multi-tenancy system. Each tenant will manage their own collections with different properties.
As the number of the tenants growing, the total number of the collections will reach 1000 soon.
As I know, the multi-tenancy config of weaviate collection is based on the same collection shared by different tenants which is not the same with us.
How can I fix it? Increase the limit of the MAXIMUM_ALLOWED_COLLECTIONS_COUNT? but it’s not recommended.

Server Setup Information

  • Weaviate Server Version: 1.30
  • Deployment Method: Kubernets
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: python
  • Multitenancy?: False

Any additional Information

Good morning @Charlie_Chen and welcome to the community — it’s great to have you here! We’re excited to help however we can.

You can absolutely create as many tenants as you want. The only real limit is your resources associate with DB — Weaviate can handle millions in tenants.

Now about collections: it’s generally recommended not to go over 1,000 of them. But tenants are different — they live within a collection. So if you’re using the same schema across multiple users or use cases, it’s much better to create one global collection and use tenants inside it. Each tenant is isolated, own shard, and pretty much behaves like its own collection — but it’s faster, more efficient, and easier to manage.

For example, say you’re building a chatbot app. Each of your users gets their own chatbot. Instead of creating a separate collection for every single user (which could really hurt performance), you’d just create one chatbot collection and make each user a tenant. Since all chatbots likely use the same schema, this setup works perfectly — you can scale to millions of users, and each tenant stays separate.

The main takeaway: avoid creating too many collections; focus on using tenants inside a shared collection when the schema is the same.

If I understood you well, you have a lot of users and in each user “collection” would be tenants, if this is the case - I would look into the schema plan again from different perspective. I am not sure really what exactly your use case.

One last note — if you’re planning to run in production at some point or test on high scale, make sure your setup includes multiple nodes and doesn’t rely on just one. Also, Weaviate is now at version 1.32.1 — you can check out the latest release notes here:

Hope this clears things up!

Best,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

Thanks for your response.

In my case — a low-code platform — each tenant can create their own collections, each with a different schema. That means tenants may define completely different sets of properties for their collections.

The only constraint I can enforce is the maximum number of collections each tenant can create — for example, 100 per tenant. But as you can imagine, with just 10 tenants, this could easily scale up to 1,000 collections, which is not sustainable.

I’ve considered two potential workarounds, but both have clear downsides:

  1. Single giant collection with a metadata text property:
    I could serialize tenant-defined properties into a JSON string and store them in a single metadata text field. But this approach severely limits filtering capabilities — I won’t be able to query by individual properties.

  2. Single giant collection with a metadata object property:
    I could store tenant-specific properties as nested fields inside a single metadata object. However, as far as I know, Weaviate currently doesn’t support filtering on nested object properties.

Do you have any recommendations or best practices for handling this kind of multi-tenant, dynamic-schema scenario?

Hey - the big problems with many collections is that the Graphql schema needs to be rebuild everytime you add/remove a collection or restart weaviate and the more collections you have the longer it will take.

I think the only way around this is to disable GraphQL using the `DISABLE_GRAPHQL` env var (this needs a restart). If you’re using our Python/TS clients everything will continue to work, but our old java/go clients are not supporting GRPC yet. There is already a beta for java with GRPC support out, but it does not support all features yet: Release 6.0.0-beta3 - Custom TrustStore, Fat JARs, Metadata Fields · weaviate/java-client · GitHub

Thanks, Is there any drawback?

hi @Charlie_Chen !!

DISABLE_GRAPHQL will disable all GRAPHQL endpoints while keeping REST and GRPC operations.

this means that if you are doing any graphql queries, it will fail. GRPC and REST calls will work as expected, so using any of our clients or connecting direct is still available.

Hey @Charlie_Chen

This might be one of those cases where enabling DISABLE_GRAPHQL could save you a lot of headaches, at least in the short term.

Quick fix to try

DISABLE_GRAPHQL=true

Possible benefits:

  • Skips GraphQL schema rebuilds → much faster startup

  • Handles hundreds or thousands of collections better

  • Works fine with Python/TS clients

  • (Loss) You lose GraphQL queries + admin tools

If you just need things running smoothly right now, this could be the fastest way forward.

Longer-term architecture suggestion

Instead of keeping 1000+ collections, it might help to group them in a way that reduces the total number while still keeping tenant data separate. Two patterns that work well:

1. Schema-Version Based Collections

collection_name = f"{data_type}_v{schema_version}"
metadata = {
    "tenant_id": "tenant_123",
    "schema_version": 2,
    "custom_properties": {...}
}

2. Hybrid Grouping
Group by what the data is, not who owns it:

  • user_profiles (with tenant_id property)

  • documents (with document_type property)

  • analytics_events

This could reduce the count from 1000 → ~20, improve resource usage, and make cross-tenant analytics possible.

Handling different schemas per tenant

A schema registry pattern might help:

schema_registry = {
    "tenant_123_users_v2": {
        "name": "TEXT",
        "custom_field_1": "INT"
    }
}

document = {
    "tenant_id": "tenant_123",
    "schema_id": "tenant_123_users_v2",
    "core_properties": {...},
    "extended_properties": {...}  # JSON blob
}

Extended properties could still be queried with Weaviate’s filters.

My Suggestion

  1. Immediate → Try DISABLE_GRAPHQL

  2. Next step → Consolidate collections & add a schema registry

  3. Later → Add cross-tenant analytics, auto schema versioning, monitoring

This approach might keep tenant flexibility while making the whole setup much easier to maintain and scale.

hi @Chaitanya_Kulthe and @Charlie_Chen !!

There are some other issues to consider with keeping multi customer data in the same Collection, and not separating either by collections or preferably using multi tenants.

Those are the cases you have a property customer_id that will be used to filter out the slice of data you want.

Because how HNSW indexes are built, for each new object from any customer to this single, “for all” collection, will use other customer data in order to build the index, and search, etc. While this works, it will definitely not scale.

Also, dropping a customer on that big single collection can be quite costful, and impact both performance and accuracy.

Let me know if this helps!

Thanks for the detailed breakdown @DudaNogueira

That’s a really important point about HNSW indexing that I hadn’t fully considered. You’re absolutely right - when all customer data lives in a single collection, the vector index gets built using all objects regardless of tenant, which means each customer’s nearest neighbor searches could be influenced by completely unrelated tenant data, even after filtering by customer_id.

This creates two major scaling problems:

  1. Search quality degradation - nearest neighbors might come from other tenants before filtering, leading to less relevant results

  2. Index contamination - customers with very different data distributions could negatively impact each other’s search accuracy

And you’re spot on about the deletion cost - removing a large customer’s data from a shared collection isn’t just a simple delete operation. It potentially requires index rebalancing or rebuilding, which could cause performance issues and downtime for all other tenants sharing that collection.

Given these HNSW-specific constraints, I can see why proper multi-tenancy or dedicated collections are much safer for true tenant isolation, especially at scale. The search quality and operational risks of shared collections are definitely more significant than I initially outlined.

1 Like

Thanks a lot for all the insights shared here!

I now understand that using a single large collection with just a tenant_id filter isn’t safe at scale because of HNSW index pollution, so the two real options are multi-tenancy (shared schema) or per-tenant collections. Since our tenants have very different schemas, we’ve been leaning toward per-tenant collections — but this could mean 1000+ collections, which worries me in terms of limits and operations.

So, currently, the best option for our case is DISABLE_GRAPHQL, am I right?

1 Like

This will probably surface when you reach 1000+ collections. And it is important to note that 1.000 is just a base number. It may start with 2.000 or more. It will depend on how many properties you have, as one of the issues is the graphql schema build.

And bear in mind that this will affect the GRAPHQL schema. So if you are using GRAPHQL calls and disable the GRAPHQL stack, this will affect you.

Our clients has been moving away from GRAPHQL calls to GRPC, so if you are using solely the client, it shouldn’t affect you.

A second challenge that surfaces with large number of collections is the collection loading at startup. We don’t want our pods taking hours to be ready, right?

For that, we have a lazy loading system in place that can mitigate it.

And a second feature that help on this challenge is HNSW snapshots, introduced on recent 1.31 version.

On top of those configurations, you will need to group any tenant on a multi tenancy collection that share the same properties/schema, and for each tenant that has a unique schema/properties, they have its own collection.

Let me know if that helps!

Thanks!

It dose help, thanks very much.