E-commerce case: many-to-many relation between products and stores

Consider this multi-tenant e-commerce search case. For each tenant, we have a product catalogue and multiple stores or markets. For each product, we have some fields that are store-agnostic (e.g. name and description) and others that are store-specific (e.g. the availability and the price). A product vector is always the same for all stores. The size of a product catalogue is usually around 30K-100K, the number of stores can go from 20 to even 200.

The store-specific fields is updated quite frequently, while store-agnostic fields are rarely updated.

How would you suggest to store this kind of data in Weaviate?

Hi @janluke - that is a good question, and an interesting one.

Do you have a currently proposed structure in mind? I think it would help us to provide feedback on a proposed design, as in some pros/cons.

First let me add that:

  1. each tenant can have very different fields (since different e-commerce website can sell completely different categories of products)
  2. we have very few tenant (assume 5), all paying tenants, and we don’t expect an exponential growth in the short term; supporting up to 100 tenants would be plenty already; of course, this would mean we’d need to redesign the system if we actually grow more than expected.

I’m very new to Weaviate and HSNW but I have the following concerns (I don’t know if they are real concerns):

  1. is having a different schema per tenant a problem if we use a single multi-tenant Product class?
  2. is having 20-200 replicas of the same product vector (one per store) in the same index a problem performance-wise?

The options I can see so far are the following:

  1. Use a single multi-tenant class with a storeId field. In this case, we have both problem 1 and 2 (huge schema, replicated vectors).

  2. Use a class per tenant and a storeId field. In this case, we have only problem 2 (replicated vectors).

  3. Use a class per tenant but use the multi-tenancy feature to have separate vector index for each store, i.e. use the storeId as tenant. In this case, we don’t have neither of problem 1 and 2. We do need to replicate the store-agnostic product data but that shouldn’t be a big issue.