Description
We are getting the following error using weaviate python client:
weaviate_db = WeaviateVectorDB(index_name)
May 31 05:30:05 PM File "/app/src/stackflow/vectordb/vectordb.py", line 816, in __init__
May 31 05:30:05 PM self.create_index()
May 31 05:30:05 PM File "/app/src/stackflow/vectordb/vectordb.py", line 852, in create_index
May 31 05:30:05 PM collection.tenants.create(
May 31 05:30:05 PM File "/app/.venv/lib/python3.10/site-packages/weaviate/collections/tenants.py", line 44, in create
May 31 05:30:05 PM self.__connection.post(
May 31 05:30:05 PM File "/app/.venv/lib/python3.10/site-packages/weaviate/connect/v4.py", line 486, in post
May 31 05:30:05 PM return self.__send(
May 31 05:30:05 PM File "/app/.venv/lib/python3.10/site-packages/weaviate/connect/v4.py", line 437, in __send
May 31 05:30:05 PM raise UnexpectedStatusCodeError(error_msg, response=res)
May 31 05:30:05 PMweaviate.exceptions.UnexpectedStatusCodeError: Collection tenants may not have been added properly for MultiTenancyClass! Unexpected status code: 422, with response body: {'error': [{'message': 'open cluster-wide transaction: concurrent transaction'}]}.
The relevant piece of code that triggers the error is:
class WeaviateVectorDB(VectorDB):
def __init__(self, index_name: str, data: dict | None = None, emb_dim: int | None = None):
"""emb_dim: embedding dimension"""
if not data:
data = {}
super().__init__(data=data)
...
...
self.embedding_dimension = emb_dim
self.top_k = 20
self.class_name = "MultiTenancyClass"
self.retrieval = "chunks"
self.index_name_hash = convert_md5_to_letters(
hashlib.md5(index_name.encode("ascii")).hexdigest()
)
self.client: weaviate.WeaviateClient = self._connect_to_weaviate()
self.create_index()
self.collection: weaviate.collections.Collection = self.client.collections.get(
self.class_name
).with_tenant(tenant=self.index_name_hash)
def _load_library() -> list[Library] | Library | None:
pass
def create_index(self):
if not self.client.collections.exists(self.class_name):
self.client.collections.create(
self.class_name,
properties=[
weaviate.classes.config.Property(
name="text", data_type=weaviate.classes.config.DataType.TEXT
),
weaviate.classes.config.Property(
name="doc", data_type=weaviate.classes.config.DataType.TEXT
),
weaviate.classes.config.Property(
name="lib", data_type=weaviate.classes.config.DataType.TEXT
),
weaviate.classes.config.Property(
name="page", data_type=weaviate.classes.config.DataType.TEXT
),
weaviate.classes.config.Property(
name="chunk", data_type=weaviate.classes.config.DataType.TEXT
),
],
vector_index_config=weaviate.classes.config.Configure.VectorIndex.hnsw(
distance_metric=weaviate.classes.config.VectorDistances.COSINE
),
multi_tenancy_config=weaviate.classes.config.Configure.multi_tenancy(True),
)
collection = self.client.collections.get(self.class_name)
collection.tenants.create(
tenants=[weaviate.classes.tenants.Tenant(name=self.index_name_hash)]
)
We have a weaviate enterprise database cluster that we use to index documents using a single collection schema. Since we need the indexing to be independant for each user, we are using tenants to ensure data isolation. We have about 500k tenants total with hundreds of tenants being created every day.
The error happens occasionally (once every half an hour or so) and it looks to be caused by the call to the call to the tenants create method:
collection.tenants.create(
tenants=[weaviate.classes.tenants.Tenant(name=self.index_name_hash)]
)
I have tried to google the issue but did not find much information about the error message, the only related issue seems to be Exception at multiprocessing. The documentation doesn’t seem to mention cluster-wide transaction errors.
Trying to solve the problem, I have checked the docs and it looks like in the newest version of the python client there is a method to check if a tenant exists in the client:
collection.tenants.get_by_name(weaviatedb.index_name_hash)
However, that method is not supported by out cluster version, as it requires the server to be in version 1.25.0 or higher and we are using version 1.23.14.
Given the above, I have some questions:
- How do open-wide transactions work with weaviate?
- Can we create multiple (different) tenants at the same time safely (ie: multiple api threads creating different tenants as a result of different api calls)?
- Will this issue be resolved if we check wether the tenant exists before calling weaviate collections.tentants.create method?
- Is there anything else I am not taking into account?
Thank you in advance!
Manuel.
Server Setup Information
- Weaviate Server Version: 1.23.14
- Deployment Method: Hosted by weaviate
- Multi Node? Number of Running Nodes:
- Client Language and Version: Python, v4.5.7
- Multitenancy: Yes
Any additional Information
weaviate.exceptions.UnexpectedStatusCodeError: Collection tenants may not have been added properly for MultiTenancyClass! Unexpected status code: 422, with response body: {'error': [{'message': 'open cluster-wide transaction: concurrent transaction'}]}.