Consider this multi-tenant e-commerce search case. For each tenant, we have a product catalogue and multiple stores or markets. For each product, we have some fields that are store-agnostic (e.g. name and description) and others that are store-specific (e.g. the availability and the price). A product vector is always the same for all stores. The size of a product catalogue is usually around 30K-100K, the number of stores can go from 20 to even 200.
The store-specific fields is updated quite frequently, while store-agnostic fields are rarely updated.
How would you suggest to store this kind of data in Weaviate?
Hi @janluke - that is a good question, and an interesting one.
Do you have a currently proposed structure in mind? I think it would help us to provide feedback on a proposed design, as in some pros/cons.
First let me add that:
- each tenant can have very different fields (since different e-commerce website can sell completely different categories of products)
- we have very few tenant (assume 5), all paying tenants, and we don’t expect an exponential growth in the short term; supporting up to 100 tenants would be plenty already; of course, this would mean we’d need to redesign the system if we actually grow more than expected.
I’m very new to Weaviate and HSNW but I have the following concerns (I don’t know if they are real concerns):
- is having a different schema per tenant a problem if we use a single multi-tenant Product class?
- is having 20-200 replicas of the same product vector (one per store) in the same index a problem performance-wise?
The options I can see so far are the following:
-
Use a single multi-tenant class with a storeId
field. In this case, we have both problem 1 and 2 (huge schema, replicated vectors).
-
Use a class per tenant and a storeId
field. In this case, we have only problem 2 (replicated vectors).
-
Use a class per tenant but use the multi-tenancy feature to have separate vector index for each store, i.e. use the storeId
as tenant. In this case, we don’t have neither of problem 1 and 2. We do need to replicate the store-agnostic product data but that shouldn’t be a big issue.