Description
When I use embedded Weaviate and have a sentence-transformers model on the GPU, pthread_create errors occur, and then the embedded server won’t close successfully. Notably, this does not occur when the model is on CPU, nor does it occur when I run it in a notebook on Google Colab. Here’s a minimal reproducing example:
import weaviate
from sentence_transformers import SentenceTransformer
from weaviate.embedded import EmbeddedOptions
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
with weaviate.WeaviateClient(
embedded_options=EmbeddedOptions(
persistence_data_path="./cache/weaviate",
version="1.24.6",
additional_env_vars={"AUTOSCHEMA_ENABLED": "false", "DISABLE_TELEMETRY": "true"},
)
) as client:
pass
Here is the output (skipping Pydantic validation errors from sentence-transformers):
Started /home/kyle/.cache/weaviate-embedded: process ID 3090963
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-04-04T00:56:55Z"}
{"action":"startup","auto_schema_enabled":false,"level":"info","msg":"auto schema enabled setting is set to \"false\"","time":"2024-04-04T00:56:55Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-04-04T00:56:56Z"}
{"level":"warning","msg":"Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.","time":"2024-04-04T00:56:56Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50060","time":"2024-04-04T00:56:56Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2024-04-04T00:56:56Z"}
E0404 00:56:56.698388244 3090875 thd.cc:157] pthread_create failed: Resource temporarily unavailable
E0404 00:56:56.698432787 3090875 thd.cc:157] pthread_create failed: Resource temporarily unavailable
E0404 00:56:56.698468367 3090875 thd.cc:157] pthread_create failed: Resource temporarily unavailable
^C{"action":"restapi_management","level":"info","msg":"Shutting down... ","time":"2024-04-04T00:57:06Z"}
{"action":"restapi_management","level":"info","msg":"Stopped serving weaviate at http://127.0.0.1:8079","time":"2024-04-04T00:57:06Z"}
The process hangs after those pthread_create
errors, and while an interrupt causes the weaviate subprocess to stop (printing final two log lines), the main process still hangs there and subsequent interrupts do nothing. I have been able to confirm that the main process is hanging in the client.close()
method called by client.__exit__
.
Again, the pthread errors and the hanging do not occur when the model is on CPU. I don’t know much about CUDA or pthreads, but I guess I’m wondering if somehow the thread management of sentence-transformers and weaviate don’t play nice?
Server Setup Information
- Weaviate Server Version: 1.24.6
- Deployment Method: embedded
- Multi Node? Number of Running Nodes: 1
- Client Language and Version: Python, 4.5.4