Facing maximum context length exceed issue during vectorizing

Description

I am trying to vectorize a 1500 page long pdf (4MB) document to weaviate. I am using a mix of static and dynamic chunking strategy that will create chunks with not more than 1000 words (200 words overlap) in them. Once I chunked this document it create 648 chunks. After that, I am using the below code to vectorize the chunks and push it to weaviate.

> for row in processedText:
> 	totalChunkCount = row[0]
> 	# Access values of each row
> 	chunkOrder = row[0]			#It has the sequence of the chunk
> 	documentContent = row[1]	#Actual chunk content
> 
> 	otherDocDict = {}
> 	otherDocDict['businessApplicationNumber'] = appNumber		#Not important, Alphanumeric code, 10-12 characters long
> 	otherDocDict['applicationName'] = appName					#Not important, a text string, 20-30 characters long
> 	otherDocDict['documentContent'] = documentContent
> 	otherDocDict['chunkOrder'] = chunkOrder
> 	
> 	print("The current chunk Order is:", chunkOrder)
> 	otherDocUuid = otherDocCollection.data.insert(otherDocDict)		#otherDocCollection is our weaviate collection
> 	otherDocUuids.append(otherDocUuid)								#A list to keep track of all uploaded object uuids
> 
> 	wv_Collection_Add_Ref(otherDocCollection, otherDocUuid, sysUuid, 'hasSystem')	#creating reference here
> 	wv_Collection_Add_Ref(sysCollection, sysUuid, otherDocUuid, 'hasOtherDocs')		#creating reference here, bi-directional

Important configurations: We are using Azure-Openai-API and text-embedding-3-small

It is running well till chunkOrder 647. But in chunk 648 (the last chunk) it is failing with error: Object was not added! Unexpected status code: 500, with response body: {ā€˜errorā€™: [{ā€˜messageā€™: ā€œupdate vector: connection to: Azure OpenAI API failed with status: 400 error: This modelā€™s maximum context length is 8192 tokens, however you requested 15326 tokens (15326 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.ā€}]}.

Now, as I said, none of our chunk is more than 1000 words long, so I got suspicious and checked the word count and token count of chunk 648. It contain 983 words and as per tiktoken, 2500 tokens. Then to further check, I have changed the code so that only chunk 648 gets vectorized (all the other chunks are ignored) and it works without any issues!!

In light of this, my question is does weaviate bunch up requests and then send them to vectorize at once? Or is it retaining context from previous requests and that is why the context size is getting out of hand?
What is the fix for this issue? Is batching the request going to solve it?

Please let me know if you have any questions?

Server Setup Information

  • Weaviate Server Version: 1.24.6
  • Deployment Method: k8s using EKS
  • Multi Node? Number of Running Nodes: 2
  • Client Language and Version: Python 3.9.7; weaviate 4.5.0

Any additional Information

Unexpected status code: 500, with response body: {ā€˜errorā€™: [{ā€˜messageā€™: ā€œupdate vector: connection to: Azure OpenAI API failed with status: 400 error: This modelā€™s maximum context length is 8192 tokens, however you requested 15326 tokens (15326 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.ā€}]}
Important configurations: We are using Azure-Openai-API and text-embedding-3-small

Hi @SMukherjee !! Welcome to our community! :hugs:

This is indeed strange.

Each chunk should be sent individually. It should not retain any information from previous content. So not sure why this last chunk is getting this error.

When you print it before passing to the client, you see with the expected content, right?

We should see a feature that will allow us to have a close look into the payload being sent. The idea is that when enabling a more verbose log level, those informations will be printed in stdout logs.

for now, a ā€œhackyā€ way is to set the base url to something differently, like so:

from weaviate import classes as wvc
client.collections.delete("PayloadInspect")
collection = client.collections.create(
    "PayloadInspect",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        base_url="http://webhook/6849fef3-d146-46a8-b6ca-e76ca6cdcbe7"
    )
)

now you could run you ingestion and check the exact payload being sent, as they are now sent to an app like webhook that will output all requests.

The insertion will probably fail, but you can catch the request.

Let me know if this helps.