Error loading data

Description

I have been using Ragtriever for a few months, (with openAI, Cohere API keys) and loading hundreds of different documents. After a month with no use, and with no particular document to load that is different from the rest, I come back to the station and I get the following message:

✘ Loading data failed [E088] Text of length 1053169 exceeds maximum of
1000000. The parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you’re not using the parser or NER, it’s probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text).

Any idea what has gone wrong?

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:

Any additional Information

Hi! Do you mean Verba?

This seems some issue while chunking :thinking:

what versions are you running?

Hi! Yes, verba.
I am running weaviate-client==3.23.1 and cohere==4.33.
I am wondering if this has to do with my tier subscription in Cohere but I guess chunking comes before the embeddings.
Any possible fixes?

What is the chunking configuration?

Leaving it too high can cause this error. It is basically saying that you got some characters exceeding the the max they support.