Weaviate connection error 8/10 times but works the other two time in Kubernetes deployment

Description

We are currently using docker-swarm-based deployment on production and shifting to Kubernetes-based deployment. While everything is up including weaviate we are facing weviate connection issues when trying to insert data into weaviate using batches. Not always, but it fails 8/10 times and the error is connection error, not able to connect to weaviate. Weaviate automatically restarts as well while inserting any data, added the logs below at the time when it restarts.
On docker-swarm based deployment, everything is working fine.

Server Setup Information

  • Weaviate Server Version: 1.22.5
  • Deployment Method: K8s, helm
  • Multi Node? Number of Running Nodes: only 1 node
  • Client Language and Version: Python V3

Any additional Information

  • I am using AWS EKS and EKS nodes are in a private subnet.
  • I am using text2vec-openai as the vectorizer.
  • We have multi-tenancy enabled and the tenants are HOT as well
  • args while starting up weaviate:
args:
  - '--host'
  - '0.0.0.0'
  - '--port'
  - '8080'
  - '--scheme'
  - 'http'
  - '--config-file'
  - '/weaviate-config/conf.yaml'
  - --read-timeout=200s
  - --write-timeout=400s
  • Weaviate is flooded with the below logs:
time="2024-05-19T15:13:32Z" level=trace msg="no segment eligible for compaction" action=lsm_compaction class=Paragraph index=paragraph path=/var/lib/weaviate/paragraph_DWI57WOAQ_lsm/property_metadata_searchable shard=DWI57WOAQ
  • While checking weaviate logs I can see that it makes a request to openid-configuration as well but I’ve explicitly made the value as false for opened authentication

** Troubleshooting steps used:**

  • My app is able to connect to weaviate I made a curl request and got schema and metadata as well.
  • I have used timeout_retries as well while configuring my batch and batch_size is 10 only.
  • No issues related to resources, I am monitoring the resources while training and I’ve doubled the resources of my server and removed all the resource limits from K8s as well but weaviate is not stable and works only 2/10 times.
  • Doubled the timeout_config as compared to my docker-swarm-based setup
  • Weaviate restarts automatically while inserting any data and below are the logs I got for the previous pod which was restarted using kubectl logs <weaviate-pod-name> --previous
time="2024-05-22T11:44:40Z" level=trace msg="no segment eligible for compaction" action=lsm_compaction class=Paragraph index=paragraph path=/var/lib/weaviate/paragraph_DWI57WOAQ_lsm/property_content_searchable shard=DWI57WOAQ
time="2024-05-22T11:44:48Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/.well-known/openid-configuration
time="2024-05-22T11:44:48Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/meta
time="2024-05-22T11:44:51Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/.well-known/openid-configuration
time="2024-05-22T11:44:51Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/meta
time="2024-05-22T11:45:03Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/.well-known/openid-configuration
time="2024-05-22T11:45:03Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/meta
time="2024-05-22T11:45:06Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/.well-known/openid-configuration
time="2024-05-22T11:45:06Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/meta
time="2024-05-22T11:45:18Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/.well-known/openid-configuration
time="2024-05-22T11:45:18Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/meta
time="2024-05-22T11:45:21Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/.well-known/openid-configuration
time="2024-05-22T11:45:21Z" level=debug msg="received HTTP request" action=restapi_request method=GET url=/v1/meta
time="2024-05-22T11:45:30Z" level=debug msg="received HTTP request" action=restapi_request method=POST url=/v1/graphql
time="2024-05-22T11:45:31Z" level=debug msg="received HTTP request" action=restapi_request method=POST url=/v1/schema/Paragraph/tenants
time="2024-05-22T11:45:31Z" level=trace msg="number of partitions for class \"Paragraph\" does not match number of requested tenants" #partitions=0 #requested=1 action=add_tenants
time="2024-05-22T11:45:31Z" level=debug msg="saving updated schema to configuration store" action=schema.add_tenants
time="2024-05-22T11:45:31Z" level=debug msg="received HTTP request" action=restapi_request method=POST url=/v1/batch/objects
time="2024-05-22T11:45:31Z" level=debug msg="received HTTP request" action=restapi_request method=DELETE url="/v1/batch/objects?tenant=DWI57WOAQ"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 43.872µs" action=store_object_store_determine_status took="43.872µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 61.211µs" action=store_object_store_determine_status took="61.211µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 14.351µs" action=store_object_store_determine_status took="14.351µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 31.778µs" action=store_object_store_upsert_object_data took="31.778µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 31.299µs" action=store_object_store_upsert_object_data took="31.299µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 119.613µs" action=store_object_store_determine_status took="119.613µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 27.862µs" action=store_object_store_upsert_object_data took="27.862µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 39.337µs" action=store_object_store_determine_status took="39.337µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 27.194µs" action=store_object_store_upsert_object_data took="27.194µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 53.342µs" action=store_object_store_determine_status took="53.342µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 28.427µs" action=store_object_store_upsert_object_data took="28.427µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 30.167µs" action=store_object_store_upsert_object_data took="30.167µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 115.14µs" action=store_object_store_determine_status took="115.14µs"
time="2024-05-22T11:45:31Z" level=trace msg="retrieving previous and determining status in KV took 18.414µs" action=store_object_store_determine_status took="18.414µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 16.989µs" action=store_object_store_upsert_object_data took="16.989µs"
time="2024-05-22T11:45:31Z" level=trace msg="storing object data in KV took 30.053µs" action=store_object_store_upsert_object_data took="30.053µs"

time="2024-05-22T11:45:31Z" level=trace msg="object batch took 4.784573ms" action=batch_objects batch_size=10 took=4.784573ms
panic: close of nil channel

goroutine 103 [running]:
github.com/weaviate/weaviate/adapters/repos/db.(*vectorQueue).releaseChunk(0xc002624150, 0xc02c7ea000)
        /go/src/github.com/weaviate/weaviate/adapters/repos/db/index_queue.go:732 +0x28
github.com/weaviate/weaviate/adapters/repos/db.asyncWorker(0x0?, {0x1d38810, 0xc0033d5480}, 0x0?)
        /go/src/github.com/weaviate/weaviate/adapters/repos/db/repo.go:366 +0x1b4
github.com/weaviate/weaviate/adapters/repos/db.New.func1()
        /go/src/github.com/weaviate/weaviate/adapters/repos/db/repo.go:169 +0x6b
created by github.com/weaviate/weaviate/adapters/repos/db.New in goroutine 1
        /go/src/github.com/weaviate/weaviate/adapters/repos/db/repo.go:166 +0x76d

My application Logs:

2024-05-22T11:45:31.50544861Z stdout F 2024-05-22 11:45:31.505 | DEBUG    | urllib3.connectionpool:_make_request:474 - http://weaviate:80 "POST /v1/batch/objects HTTP/1.1" 200 None
2024-05-22T11:45:31.505275694Z stderr F [2024-05-22 11:45:31,505: DEBUG/MainProcess] http://weaviate:80 "POST /v1/batch/objects HTTP/1.1" 200 None
2024-05-22T11:45:31.287547948Z stdout F 
2024-05-22T11:45:31.287545334Z stdout F celery.exceptions.InvalidTaskError: Failed to insert data into weaviate
2024-05-22T11:45:31.287543009Z stdout F 
2024-05-22T11:45:31.287540948Z stdout F     raise InvalidTaskError("Failed to insert data into weaviate")
2024-05-22T11:45:31.287538623Z stdout F   File "/app/backend/embeddings/service.py", line 106, in task_create_embeddings
2024-05-22T11:45:31.287536314Z stdout F 
2024-05-22T11:45:31.287534282Z stdout F     return self.run(*args, **kwargs)
2024-05-22T11:45:31.287531846Z stdout F   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 760, in __protected_call__
2024-05-22T11:45:31.287529249Z stdout F     R = retval = fun(*args, **kwargs)
2024-05-22T11:45:31.287526631Z stdout F > File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 477, in trace_task
2024-05-22T11:45:31.287523642Z stdout F 
2024-05-22T11:45:31.28699494Z stderr F [2024-05-22 11:45:31,286: ERROR/MainProcess] Task backend.embeddings.service.task_create_embeddings[20518e81-af2a-4c29-a19d-d9c4625e3c4b] raised unexpected: InvalidTaskError('Failed to insert data into weaviate')



2024-05-22T11:45:31.2852683Z stdout F 2024-05-22 11:45:31.285 | DEBUG    | urllib3.connectionpool:_make_request:474 - http://weaviate:80 "DELETE /v1/batch/objects?tenant=DWI57WOAQ HTTP/1.1" 200 270
2024-05-22T11:45:31.285111655Z stderr F [2024-05-22 11:45:31,285: DEBUG/MainProcess] http://weaviate:80 "DELETE /v1/batch/objects?tenant=DWI57WOAQ HTTP/1.1" 200 270
2024-05-22T11:45:31.2833688Z stdout F 2024-05-22 11:45:31.283 | DEBUG    | urllib3.connectionpool:_get_conn:291 - Resetting dropped connection: weaviate
2024-05-22T11:45:31.283229575Z stderr F [2024-05-22 11:45:31,283: DEBUG/MainProcess] Resetting dropped connection: weaviate
2024-05-22T11:45:31.282412984Z stdout F 
2024-05-22T11:45:31.282407219Z stdout F requests.exceptions.ConnectionError: Batch was not added to weaviate.
2024-05-22T11:45:31.282404783Z stdout F 
2024-05-22T11:45:31.282402276Z stdout F           └ <class 'requests.exceptions.ConnectionError'>
2024-05-22T11:45:31.282395753Z stdout F     raise RequestsConnectionError("Batch was not added to weaviate.") from conn_err
2024-05-22T11:45:31.282386678Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 742, in _create_data
2024-05-22T11:45:31.282378524Z stdout F                └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282375411Z stdout F                │    └ <function Batch._create_data at 0x7f39feaffc10>
2024-05-22T11:45:31.282372926Z stdout F     response = self._create_data(
2024-05-22T11:45:31.282370749Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1099, in _flush_in_thread
2024-05-22T11:45:31.282368107Z stdout F              └ None
2024-05-22T11:45:31.282365727Z stdout F              │        └ None
2024-05-22T11:45:31.282363534Z stdout F              │        │            └ None
2024-05-22T11:45:31.282361294Z stdout F     result = self.fn(*self.args, **self.kwargs)
2024-05-22T11:45:31.282356209Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
2024-05-22T11:45:31.282353501Z stdout F           └ None
2024-05-22T11:45:31.282351137Z stdout F     raise self._exception
2024-05-22T11:45:31.282348709Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2024-05-22T11:45:31.282346586Z stdout F            └ None
2024-05-22T11:45:31.282344277Z stdout F     return self.__get_result()
2024-05-22T11:45:31.282340587Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2024-05-22T11:45:31.282329781Z stdout F                                    └ <Future at 0x7f3a025c8ee0 state=finished raised ConnectionError>
2024-05-22T11:45:31.282327321Z stdout F                                    │           └ <function Future.result at 0x7f3a02727310>
2024-05-22T11:45:31.28232506Z stdout F     response_objects, nr_objects = done_future.result()
2024-05-22T11:45:31.282322448Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1151, in _send_batch_requests
2024-05-22T11:45:31.282319901Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282317637Z stdout F     │    └ <function Batch._send_batch_requests at 0x7f39fea860d0>
2024-05-22T11:45:31.282314719Z stdout F     self._send_batch_requests(force_wait=False)
2024-05-22T11:45:31.282302087Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1242, in _auto_create
2024-05-22T11:45:31.28229852Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282295618Z stdout F     │    └ <function Batch._auto_create at 0x7f39fea86160>
2024-05-22T11:45:31.282292253Z stdout F     self._auto_create()
2024-05-22T11:45:31.282287575Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 569, in add_data_object
2024-05-22T11:45:31.282285173Z stdout F 
2024-05-22T11:45:31.282282937Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282276776Z stdout F     │     └ <function Batch.add_data_object at 0x7f39feaffaf0>
2024-05-22T11:45:31.282274603Z stdout F     batch.add_data_object(
2024-05-22T11:45:31.282272183Z stdout F   File "/app/backend/utils.py", line 382, in batch_insert_data
2024-05-22T11:45:31.282265159Z stdout F 
2024-05-22T11:45:31.282262959Z stdout F           └ None
2024-05-22T11:45:31.282255779Z stdout F     raise self._exception
2024-05-22T11:45:31.282253444Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2024-05-22T11:45:31.282251042Z stdout F            └ None
2024-05-22T11:45:31.282248506Z stdout F     return self.__get_result()
2024-05-22T11:45:31.282241499Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2024-05-22T11:45:31.282239296Z stdout F                                    └ <Future at 0x7f3a025c8ee0 state=finished raised ConnectionError>
2024-05-22T11:45:31.282237053Z stdout F                                    │           └ <function Future.result at 0x7f3a02727310>
2024-05-22T11:45:31.282234831Z stdout F     response_objects, nr_objects = done_future.result()
2024-05-22T11:45:31.282231203Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1151, in _send_batch_requests
2024-05-22T11:45:31.282227617Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282224022Z stdout F     │    └ <function Batch._send_batch_requests at 0x7f39fea860d0>
2024-05-22T11:45:31.282220159Z stdout F     self._send_batch_requests(force_wait=True)
2024-05-22T11:45:31.28221752Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1252, in flush
2024-05-22T11:45:31.282215315Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282212882Z stdout F     │    └ <function Batch.flush at 0x7f39fea861f0>
2024-05-22T11:45:31.282210617Z stdout F     self.flush()
2024-05-22T11:45:31.282208306Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1646, in __exit__
2024-05-22T11:45:31.282205936Z stdout F 
2024-05-22T11:45:31.28220378Z stdout F     return True
2024-05-22T11:45:31.282201666Z stdout F   File "/app/backend/utils.py", line 385, in batch_insert_data
2024-05-22T11:45:31.282199549Z stdout F 
2024-05-22T11:45:31.282197427Z stdout F           └ None
2024-05-22T11:45:31.282195114Z stdout F     raise self._exception
2024-05-22T11:45:31.282192284Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2024-05-22T11:45:31.282190032Z stdout F            └ None
2024-05-22T11:45:31.282187796Z stdout F     return self.__get_result()
2024-05-22T11:45:31.282185522Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2024-05-22T11:45:31.282183242Z stdout F     └ None
2024-05-22T11:45:31.28218058Z stdout F     │                              └ <Future at 0x7f3a025c8ee0 state=finished raised ConnectionError>
2024-05-22T11:45:31.282178398Z stdout F     │                              │           └ <function Future.result at 0x7f3a02727310>
2024-05-22T11:45:31.282172037Z stdout F     response_objects, nr_objects = done_future.result()
2024-05-22T11:45:31.282167096Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1151, in _send_batch_requests
2024-05-22T11:45:31.282165054Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282162978Z stdout F     │    └ <function Batch._send_batch_requests at 0x7f39fea860d0>
2024-05-22T11:45:31.282160702Z stdout F     self._send_batch_requests(force_wait=False)
2024-05-22T11:45:31.282158218Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1242, in _auto_create
2024-05-22T11:45:31.282154646Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282151111Z stdout F     │    └ <function Batch._auto_create at 0x7f39fea86160>
2024-05-22T11:45:31.282147332Z stdout F     self._auto_create()
2024-05-22T11:45:31.282143699Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 569, in add_data_object
2024-05-22T11:45:31.282140465Z stdout F 
2024-05-22T11:45:31.282138078Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282135774Z stdout F     │     └ <function Batch.add_data_object at 0x7f39feaffaf0>
2024-05-22T11:45:31.282133317Z stdout F     batch.add_data_object(
2024-05-22T11:45:31.282130609Z stdout F   File "/app/backend/utils.py", line 382, in batch_insert_data
2024-05-22T11:45:31.282128249Z stdout F 
2024-05-22T11:45:31.282126079Z stdout F           └ None
2024-05-22T11:45:31.282123751Z stdout F     raise self._exception
2024-05-22T11:45:31.282121377Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2024-05-22T11:45:31.282118753Z stdout F            └ None
2024-05-22T11:45:31.282116396Z stdout F     return self.__get_result()
2024-05-22T11:45:31.282108389Z stdout F   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2024-05-22T11:45:31.282099414Z stdout F     └ None
2024-05-22T11:45:31.282097089Z stdout F     │                              └ <Future at 0x7f3a025c8ee0 state=finished raised ConnectionError>
2024-05-22T11:45:31.28209463Z stdout F     │                              │           └ <function Future.result at 0x7f3a02727310>
2024-05-22T11:45:31.282092296Z stdout F     response_objects, nr_objects = done_future.result()
2024-05-22T11:45:31.282089857Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1151, in _send_batch_requests
2024-05-22T11:45:31.282087413Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282085129Z stdout F     │    └ <function Batch._send_batch_requests at 0x7f39fea860d0>
2024-05-22T11:45:31.282082873Z stdout F     self._send_batch_requests(force_wait=True)
2024-05-22T11:45:31.28208024Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1252, in flush
2024-05-22T11:45:31.282077525Z stdout F     └ <weaviate.batch.crud_batch.Batch object at 0x7f3a0214c280>
2024-05-22T11:45:31.282073453Z stdout F     │    └ <function Batch.flush at 0x7f39fea861f0>
2024-05-22T11:45:31.282070048Z stdout F     self.flush()
2024-05-22T11:45:31.282066665Z stdout F   File "/usr/local/lib/python3.9/site-packages/weaviate/batch/crud_batch.py", line 1646, in __exit__
2024-05-22T11:45:31.282063146Z stdout F 
2024-05-22T11:45:31.282060696Z stdout F     return True

hi @Divyansh_Mishra ! Welcome to our community! :hugs:

Do you have the same outcome if using the latest version (1.25.2 as of now)?

Hey @DudaNogueira !
Latest version of python client or weaviate?

Sorry!

Latest version of Weaviate Server.

Hey, @DudaNogueira Migrating to a new weaviate server could be an issue since it’ll include codebase changes, upgrading the weaviate client, and maybe some compatibility issues.
Since the current version is working fine on docker-swarm-based deployment, what could be the issue that it is not working on K8s deployment?
Below is the startup script and the env variables I’ve set:

args:
  - '--host'
  - '0.0.0.0'
  - '--port'
  - '8080'
  - '--scheme'
  - 'http'
  - '--config-file'
  - '/weaviate-config/conf.yaml'
  - --read-timeout=200s 
  - --write-timeout=400s
env:
  - LIMIT_RESOURCES: true
  - ASYNC_INDEXING: true
  - MODULE_CONFIG_TIMEOUT: 10m 
  • I’ve also removed all the resource limits from my weaviate helm and doubled the resources of my server

Hey @DudaNogueira!

I spent some more time debugging, took a careful look at the weaviate logs, and understood a few terms for which I had a knowledge gap and could see a pattern when the batch insertion of data is failing for a particular tenant/shard!

I have multi-tenancy enabled and if the tenant/shard falls under the below error, then the batch insertion operation fails!

time="2024-05-25T03:43:19Z" level=trace msg="no segment eligible for compaction" action=lsm_compaction class=Paragraph index=paragraph path=/var/lib/weaviate/paragraph/XXsW7PQdK/lsm/objects shard=XXsW7PQdK

If the segment is not eligible for compaction then the weaviate is not able to batch insert any data for this particular tenant/shard.

If the bloom filter build was successful for any particular tenant/shard then the batch insert operation is working fine! (This bloom filter building operation is also a bit random, sometimes for the same tenant/shard it throws a segment not eligible for compaction error and sometimes it is able to build the bloom filter successfully)

From the below logs, I saw that when the bloom filter was built successfully for shard=shared, weaviate was able to insert data and when it errored out with segment not eligible for compaction weaviate was not able to batch insert the data

Some additional information:
Weaviate server version - 1.23.6
Client - Python V3
multi_tenancy is enabled and sharding is not enabled

Few doubts

  • When is a segment not eligible for compaction, is this related to the size of the data?
  • Can we configure these values?

I would love your thoughts on this, thanks :smiley:

Hi!

I believe that it’s best looking into migrating. A lot of those issues were fixed since then.

From I could understand, that code is an extra check for finding eligible segments to compact, and that trace level log is just a verbose way to say: “no segment found”.

Upgrading will not necessarily need change in your codebase.

You can upgrade, and use the new python v4 package. It also includes the python client v3 package, so you can migrate your codebase to use the python v4 client as needed and start on more critical areas first, like batching, querying, etc.

Let me know if this helps

Thanks!

Hey @DudaNogueira, We’ve upgraded the weaviate server version from 1.22.5 to 1.23.6, we are still getting "no segment eligible for compaction" but things are working fine. We are now able to batch_insert the data successfully. Increased the timeout_config, added timeout_retries, and also reduced the batch_size.

For us based on our techstack, upgrading to Python V4 also includes upgrading Pydantic to V2 which brings in multiple compatibility issues with langchain as well since langchain has not fully migrated to pydantic V2 as well internally.

Things are fine as of now, will work on migrating to new client version slowly.

Thanks a lot for your help!

hi @Divyansh_Mishra !

no worries on the no segment eligible for compaction you are only seeing those because your LOG_LEVEL is probably set to trace.

Our integration with Langchain is quite updated. It also supports pyv4:

We have some updated recipes on how to use langchain with Weaviate here:

I will mark this topic as solved.

Thanks!

1 Like