OpenAI GPT4 use with the generate.near_text() function

First off - great product and the vectorisation and semantic search is working well for me. I have some problems with the generative functions using GPT4.

I’ve setup a collection and configure to use GPT 4


And issuing a generative query as follows:

# Run generative query
response = chunks.generate.near_text(

Chunks are set to be 200 words long

I have 2 problems that I’m encountering:

  1. When using GPT4 the response can take a while to be generated. If it goes above 40 seconds, I get the following error:

weaviate.exceptions.WeaviateQueryException: Query call failed with message send POST request: Post “”: context deadline exceeded (Client.Timeout exceeded while awaiting headers).

  1. Ideally, as GPT4 can handle a larger input context I want to pass more chunks when running the above, however if I go above limit=10 I get an error as follows:

WeaviateQueryException: Query call failed with message connection to: OpenAI API failed with status: 400 error: max_tokens is too large: 8097. This model supports at most 4096 completion tokens, whereas you provided 8097…

Am I missing something in my configuration / code that would resolve the issues?

When I look at the Verba codebase I see that the GPT4 requests are setup differently to the current Weaviate Python 4 implementation, and allow for streaming and passing the context, etc. Is there a plan to adapt the generate functions to allow for steaming and passing additional context?


Hi @Ed_Lambe !! Welcome to our community! :hugs:
and thanks for choosing Weviate :slight_smile:

I am afraid that that timeout error is OpenAi related :grimacing:

I have noted some users complaining about timeouts here and there recently. And it seems random. They would come and go without any pattern. Maybe your api-key or account was allocated in a bad pod for now…

I believe this second error (and I had to do some research on this) is because while gpt-4-1106-preview allows 128000 tokens as context window, it will output 4096 tokens.

This indeed seems misleading, becase you pass 128000 as max_tokens, but the output says the max_token is 4096 :thinking:

I have found a nice discussion here:

I have tought about an improvement that is about printing in logs the payload for Weaviate modules when in a verbose log_level. This would help catching the exac payload sent, and would help debugginig this very scenario as we could reproduce the payload request outside Weaviate.

Verba does the generation outside weaviate (using openai lib) to leverage the socket/stream capability.

This is not yet in our roadmap, but please, feel more than welcome to create a feature request so we can

Let me know if this helps :slight_smile:

Thanks for the reply.

On the timeout issue, having gone through the code base I see there is a configuration option that can be added that extends the timeout from the default of 50 seconds.

I’ve set this to 120s and this is no returning results :smile:


FYI - I don’t see this configuration option mentioned anywhere in the documentation.

I’ll dig a little deeper for the token limit passing the context.