Ollama Generative Query Timeout Issue

I’m experiencing a context deadline exceeded error on generative queries to an Ollama module. The query consistently fails after ~51 seconds, despite timeouts being set much higher.

Environment:

Weaviate Server: v1.31.0 (in Docker)
Modules: text2vec-ollama, generative-ollama
Model: phi4 (on a CPU-only host)

Problem Summary: The error occurs when Weaviate’s generative-ollama module calls the Ollama API: ...send POST request: Post ".../api/generate": context deadline exceeded...

I have already configured the following timeouts:

  1. Server-Side: Set EXTENSIONS_CLIENT_TIMEOUT: '1000s' in Weaviate’s Docker environment (confirmed active with docker inspect).
  2. Client-Side: Set the gRPC query timeout to 600s in the Python client (additional_config).

Crucially, a direct curl request to the Ollama endpoint succeeds, but takes over 5 minutes to complete.

Core Question: Since the query fails at ~51s, it seems another, lower timeout is taking precedence over both my client and server configurations. Is there a different, non-obvious timeout setting specific to the generative search module that I need to be aware of?

Thanks for your help.

Hi @Francesco_Lai !!

Welcome to our community :hugs:

Not sure what environment variable is this one :grimacing:

Here you can find a list of all env vars:

You are most certainly hitting the MODULES_CLIENT_TIMEOUT that is by default, you guessed, 50s

Ah! And one note: Once you start seeing yourself tweaking default timeouts in Weaviate, you probably need to throw some resources around (more memory, or cpu or nodes) or, in this case, gpu at the service model.

Unless, of course, latency is not a problem :nerd_face:

Let me know if this helps!

Happy coding!

Side node: While the name of the env variable MODULES_CLIENT_TIMEOUT may not make sense at first, at a closer look: it does.

This will set the timeout for the module client, that will reach out to your service. :wink: