"pthread_create failed" when using embedded server

Description

When I use embedded Weaviate and have a sentence-transformers model on the GPU, pthread_create errors occur, and then the embedded server won’t close successfully. Notably, this does not occur when the model is on CPU, nor does it occur when I run it in a notebook on Google Colab. Here’s a minimal reproducing example:

import weaviate
from sentence_transformers import SentenceTransformer
from weaviate.embedded import EmbeddedOptions


model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")


with weaviate.WeaviateClient(
    embedded_options=EmbeddedOptions(
        persistence_data_path="./cache/weaviate",
        version="1.24.6",
        additional_env_vars={"AUTOSCHEMA_ENABLED": "false", "DISABLE_TELEMETRY": "true"},
    )
) as client:
    pass

Here is the output (skipping Pydantic validation errors from sentence-transformers):

Started /home/kyle/.cache/weaviate-embedded: process ID 3090963
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-04-04T00:56:55Z"}
{"action":"startup","auto_schema_enabled":false,"level":"info","msg":"auto schema enabled setting is set to \"false\"","time":"2024-04-04T00:56:55Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-04-04T00:56:56Z"}
{"level":"warning","msg":"Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.","time":"2024-04-04T00:56:56Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50060","time":"2024-04-04T00:56:56Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2024-04-04T00:56:56Z"}
E0404 00:56:56.698388244 3090875 thd.cc:157]                           pthread_create failed: Resource temporarily unavailable
E0404 00:56:56.698432787 3090875 thd.cc:157]                           pthread_create failed: Resource temporarily unavailable
E0404 00:56:56.698468367 3090875 thd.cc:157]                           pthread_create failed: Resource temporarily unavailable
^C{"action":"restapi_management","level":"info","msg":"Shutting down... ","time":"2024-04-04T00:57:06Z"}
{"action":"restapi_management","level":"info","msg":"Stopped serving weaviate at http://127.0.0.1:8079","time":"2024-04-04T00:57:06Z"}

The process hangs after those pthread_create errors, and while an interrupt causes the weaviate subprocess to stop (printing final two log lines), the main process still hangs there and subsequent interrupts do nothing. I have been able to confirm that the main process is hanging in the client.close() method called by client.__exit__.

Again, the pthread errors and the hanging do not occur when the model is on CPU. I don’t know much about CUDA or pthreads, but I guess I’m wondering if somehow the thread management of sentence-transformers and weaviate don’t play nice?

Server Setup Information

  • Weaviate Server Version: 1.24.6
  • Deployment Method: embedded
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python, 4.5.4

Hi @kylrth !

Does this happens only when running on embedded server?

When running Weaviate in Docker, it runs as expected?

This is a hard one to reproduce. I can’t, as my mac doesn’t have GPU :stuck_out_tongue: