weaviate.exceptions.WeaviateQueryError: Query call with protocol GRPC search failed with message send POST request: Post "serverurl:11435/api/generate": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Description

Can you please help me fix this issue? What could be causing it? Is there any configuration I can add to increase the execution time?
I have created the Weavaite index with Ollama and stored the doc. When I perform a generated search I face this issue.

this command is working
!curl myserverurl:11434/api/generate -d ‘{“model”: “llama3”,“prompt”:“What is a vector database?”, “stream”: false }’

Server Setup Information

  • Weaviate Server Version: 1.25.4
  • Deployment Method: docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version:python v4
  • Multitenancy?: no

Any additional Information

index schema
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_ollama(
api_endpoint=“myserverurl:11435” ,
model=“mxbai-embed-large” ),
generative_config=wvc.config.Configure.Generative.ollama(
api_endpoint=“myserverurl::11435”,
model=“llama3”
),
)
properties=[wvc.config.Property(
name=“text”,
data_type=wvc.config.DataType.TEXT,
skip_vectorization=False,
vectorize_property_name=True,
tokenization=wvc.config.Tokenization.LOWERCASE


]
)

Hi @Mariam!!

Can you post the full stacktrace?

My bet here is that the GRPC port is not exposed properly.

Also, could you share the docker compose you are using?

considering you are mapping the ports in you docker, it should be something like this, considering both the curl and the error message:

    ports:
    - 11434:8080
    - 11435:50051

so http endpoint is mapped to 11434 and grpcto 11435

Let me know if that helps!

Thanks!

Hi , @DudaNogueira

I updated my Docker configuration to include the following port mappings:
ports:
- 11434:8080
- 11435:50051

However, I am still encountering the same issue when running the following code:

rresult = client.collections.get(collection_name)
response = result.generate.near_text(
query=prompt,
limit=1,
grouped_task=f"Answer the question: {prompt}? only using the given context in {{chunk}}"
)
print(response.generated)

I suspect the problem might be related to the models I am using. Is that possible? The vectorizer model is ‘mxbai-embed-large’ and the generative model is ‘llama3’.
@DudaNogueira Thank you for your prompt response and support.

Hi!

Oh, sorry. I didn’t noticed you are using ollama.

Have you seen this recipe?

It is using ollama, so pretty much the same as you are doing.

If you are using ollama installed on your machine, you will need to change the base url to host.docker.internal: (check the notebook for more info)

So I believe you can leave the ports mapped as:

    ports:
    - 8080:8080
    - 50051:50051

And now, set ollama url to where it is running (again, depending on how you are running ollama)

Let me know if this helps or I can help you.

We just had an office hours today. This is an online event where I answer any questions you bring :slight_smile:

Stay tuned for more here: Online Workshops & Events | Weaviate - Vector Database