First off - great product and the vectorisation and semantic search is working well for me. I have some problems with the generative functions using GPT4.
I’ve setup a collection and configure to use GPT 4
…
generative_config=wvc.Configure.Generative.openai(
model=“gpt-4-1106-preview”,
max_tokens=128000
),
…
And issuing a generative query as follows:
…
# Run generative query
response = chunks.generate.near_text(
query=query,
filters=filter,
limit=10,
return_properties=[“chunk”],
grouped_task=task,
)
…
Chunks are set to be 200 words long
I have 2 problems that I’m encountering:
- When using GPT4 the response can take a while to be generated. If it goes above 40 seconds, I get the following error:
weaviate.exceptions.WeaviateQueryException: Query call failed with message send POST request: Post “https://api.openai.com/v1/chat/completions”: context deadline exceeded (Client.Timeout exceeded while awaiting headers).
- Ideally, as GPT4 can handle a larger input context I want to pass more chunks when running the above, however if I go above limit=10 I get an error as follows:
WeaviateQueryException: Query call failed with message connection to: OpenAI API failed with status: 400 error: max_tokens is too large: 8097. This model supports at most 4096 completion tokens, whereas you provided 8097…
Am I missing something in my configuration / code that would resolve the issues?
When I look at the Verba codebase I see that the GPT4 requests are setup differently to the current Weaviate Python 4 implementation, and allow for streaming and passing the context, etc. Is there a plan to adapt the generate functions to allow for steaming and passing additional context?
Thanks