Error : text too long for vectorization

Sumat_Mallick · September 30, 2024, 10:27am

Description

We are moving from v3 to v4. Now, when I use batch upload or multi-insert, I get an error message saying the text is too long for vectorization.

Can anyone help me regarding this?

Below is my code.

def vectorize_tag_page_data(texts, class_name, layer_name):

# Append layer suffix to class name
class_name = f"{class_name}01pagedata"

# Load configuration from environment variables
weaviate_url = os.getenv(f"URL")
weaviate_auth_key = os.getenv(f"AUTH_KEY")
openai_key = os.getenv("OPENAI_API_KEY")

if not weaviate_url or not weaviate_auth_key or not openai_key:
    raise EnvironmentError("One or more required environment variables are missing")

# Prepare data objects for insertion
data_objs = [{"text": texts[key], "metadata": key} for key in texts]
total = len(data_objs)

print(f"\n{total} data objects prepared for insertion.\n")
print(f"Layer URL: {weaviate_url}")

# Initialize client with authentication
client = initialize_weaviate_client(weaviate_url, weaviate_auth_key, openai_key)

# Create collection in Weaviate
try:
    response = client.collections.create(
        name=class_name,
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
        properties=[
            wvc.config.Property(name="text", data_type=wvc.config.DataType.TEXT),
            wvc.config.Property(name="metadata", data_type=wvc.config.DataType.TEXT),
        ],
    )
    print(response.config.get(simple=False))
except Exception as e:
    print(f"Error while creating collection: {e}")
finally:
    client.close()

# Reinitialize client for data insertion
client = initialize_weaviate_client(weaviate_url, weaviate_auth_key, openai_key)

# Insert data using batching
try:
    collection = client.collections.get(class_name)
    with collection.batch.dynamic() as batch:
        print("Batch insertion started.")
        for i, data_obj in enumerate(data_objs, 1):
            batch.add_object(properties=data_obj)
            print(f"Uploaded Tag Data: {i}/{total}")

        # Check for batch insertion errors
        if batch.number_errors > 0:
            print(f"Number of errors during batch insertion: {batch.number_errors}")
        else:
            print("Batch insertion completed successfully.")

    # Optional: Verify insertion by querying the collection
    try:
        result = collection.query.bm25(query="genAI", limit=10)
        print("\nQuery Results:", result)
    except Exception as e:
        print(f"Error while querying: {e}")
except Exception as e:
    print(f"An exception occurred: {e}")
finally:
    if client is not None:
        client.close()

def initialize_weaviate_client(url, auth_key, openai_key):

client = wvc.Client(
    url=url,
    auth_client_secret=wvc.AuthApiKey(api_key=auth_key),
    additional_headers={
        "X-OpenAI-Api-Key": openai_key
    }
)
return client

Server Setup Information

Weaviate Server Version: 4.8.1
Deployment Method: I am using on python directly
Multi Node? Number of Running Nodes: 1
Client Language and Version: Python 3.12.3
Multitenancy?:

Any additional Information

[ErrorObject(message=“WeaviateInsertManyAllFailedError(‘Every object failed during insertion. Here is the set of all errors: text too long for vectorization’)”, object_=BatchObject(collection=‘Sswhhsdflesdfesssssr01pagedata’, vector=None, uuid=‘fb5fc0a6-f652-4e64-bc36-d1bcc0536e0b’, properties={‘text’: ‘a’, ‘metadata’: ‘name1’}, tenant=None, references=None, index=0, retry_count=0), original_uuid=None), ErrorObject(message=“WeaviateInsertManyAllFailedError(‘Every object failed during insertion. Here is the set of all errors: text too long for vectorization’)”, object=_BatchObject(collection=‘Sswhhsdflesdfesssssr01pagedata’, vector=None, uuid=‘e21581f3-c7dd-447b-b240-bf566314eea7’, properties={‘text’: ‘1’, ‘metadata’: ‘value1’}, tenant=None, references=None, index=1, retry_count=0), original_uuid=None)]

data I am trying to insert = [{‘text’: ‘a’, ‘metadata’: ‘name1’}, {‘text’: ‘1’, ‘metadata’: ‘value1’}]

DudaNogueira · September 30, 2024, 2:44pm

hi @Sumat_Mallick !!

Welcome to our community

Can you past the entire stace track?

Also, can you share here the code snippet so I can try to reproduce this?

Thanks!

Michael_Pont · October 1, 2024, 2:20pm

I am having the same issue.
See the following links for reference. @Dirk has also been investigating this.

github.com/weaviate/weaviate

Maximum content length - token count incorrectly counted for object (open ai vectorizer)

opened 07:33PM - 18 Aug 24 UTC

michael-pont

bug

### How to reproduce this bug? Use v3 of typescript client ( "weaviate-clien…t": "^3.1.3",) batch update the object ```typescript async function insertSingleBatch<T>({ batch, collection, }: { batch: (DataObject<T> | NonReferenceInputs<T>)[] collection: Collection<T, string> }) { const failedItems: BatchObject<T>[] = [] const errors = new Set() const response = await backOff(() => collection.data.insertMany(batch), { delayFirstAttempt: false, numOfAttempts: 6, startingDelay: 15000, jitter: 'full', timeMultiple: 2, retry: (err) => { logger.error(err, `Error batch inserting objects`) return true }, }) if (response.hasErrors) { for (const [_index, error] of Object.entries(response.errors)) { errors.add(error.message) if (error.message.includes('maximum context length')) { logger.error( { originalObject: error.object }, 'Maximum context length error - aborting as we do not want to keep retrying the same large text' ) // For now, we do not want to keep rerying the same text so let's simply throw an error throw new VectorDbExportError( 'Maximum context length error - aborting as we do not want to keep retrying the same large text' ) } failedItems.push(error.object) } } if (errors.size > 0) { logger.error( { errors: Array.from(errors), }, `Failed to insert ${failedItems.length} items to vector DB` ) } return failedItems } ``` ### What is the expected behavior? Object should be correctly vectorized and updated/created in database. Instead there is a token limit even though pasting the text into a online tokenizer shows around 2000 tokens. ### What is the actual behavior? Object fails even though it is clearly smaller than 8192 tokens ### Supporting information ```json { "originalObject": { "class": "WebPage", "properties": { "url": "https://www.avast.com/en-in/index", "categories": [ "other" ], "text": "We’re sorry, your browser appears to be outdated.To see the content of this webpage correctly, please update to the latest version or install a new browser for free, such as Avast Secure Browser or Google Chrome. Avast Secure Browser Google Chrome Save 48% on Premium Security List of available regions Americas Argentina Brasil Canada (English) Canada (français) Chile Colombia EE.UU. (español) México USA (English) América Latina (español) Europe, Middle East & Africa België (Nederlands) Belgique (français) Česká republika Danmark Deutschland España France Italia Magyarország Nederland Norge Polska Portugal România Schweiz (Deutsch) Slovensko (česky) South Africa Suisse (français) Suomi Sverige Türkiye United Arab Emirates United Kingdom Ελλάδα ישראל Казахстан Россия Україна (українська) Украина (русский) المملكة العربية السعودية الدول العربية Europe (English) Worldwide (English) Asia & Pacific Australia India इंडिया (हिंदी) Indonesia (English) Indonesia (Bahasa Indonesia) Malaysia (English) Malaysia (Bahasa Melayu) New Zealand Philippines (English) Pilipinas (Filipino) Singapore Việt Nam 日本語 대한민국 简体中文繁體中文 ประเทศไทย Main regions Worldwide (English) América Latina (español) Europe (English) Free antivirus is your first step to online freedom Free We believe everyone has the right to be safe online, which is why we offer our award-winning free antivirus to millions of people around the world. free antivirus Free download Free download Download freefrom Google Play Download freefrom the App Store Also available for Mac, Android, and iOS Mac Android iOS Also available for PC, Android, and iOS PC Android iOS Also available for Mac, Android, and PC Mac Android PC Also available for PC, Mac, and iOS PC Mac iOS 2024Top Rated Top Rated 2024Best Protection Best Protection Trustpilot Award-winning Antivirus Catch even new & emerging threats Hundreds of millions of users worldwide We have 30+ years of experience Download free antivirus 2022Top Rated Top Rated Get it for free! Free download Free download Download freefrom Google Play Download freefrom the App Store Avast Free Antivirus Get free antivirus that comes with advanced privacy and security tools Avast Free Antivirus is more than just an antivirus — it also includes these specialist tools: 6 layers of security Effortlessly run smart scans on software, files, and apps to help find vulnerabilities, plus analyze suspicious files in the cloud, get threat alerts, and more. Easy to install and use It only takes a moment to install Avast Free Antivirus and once it’s done, it’ll run quietly in the background, helping to protect you against viruses and other malware in real time, 24/7. Wi-Fi network security Connect more safely to any Wi-Fi network, even unsecured public networks, plus see who’s using your home Wi-Fi and help block intruders with a click. Protection against ransomware attacks Help protect your information. Don't let your personal photos, files, and documents fall victim to hackers using ransomware. Free download Free download Download freefrom Google Play Download freefrom the App Store Learn more Avast Premium Security Comprehensive protection for all your devices Our most advanced protection is your toughest defense against viruses, ransomware, zero-day threats, Wi-Fi vulnerabilities, and more. viruses ransomware zero-day threats Get protected Avast SecureLine VPN Choose a VPN for more online privacy Help block ISPs from tracking your activity, help avoid geo-restrictions from content providers, and ensure public Wi-Fi is safer with Avast’s Virtual Private Network (VPN). Virtual Private Network (VPN) Discover VPN Avast Cleanup Premium Enjoy more storage space and a faster device Reclaim gigabytes of storage space and get your device working like new by removing junk like leftover files, bloatware, and unwanted programs. Avast Cleanup also updates your software automatically, hibernates resource-draining apps, and more. Discover Cleanup It’s so easy to install — switching to Avast takes seconds You can start using Avast’s award-winning antivirus immediately. It’s quick and easy to install, and gives you all the protection you need to live your online life securely. And it’s totally free — so give it a try right now. Free download Free download Download freefrom Google Play Download freefrom the App Store Avast has hundreds of millions of users worldwide I have used Avast for a few years. The protection is the best for the money. I also cover my phones with Avast and I haven’t had any problems yet. Ryan R. 4.5 I have used Avast for many years. The reason is very simple: you offer a great free version that actually works. This lets me afford the other amazing services you offer when needed, like Avast Cleanup. Eric S. I've used Avast Free Antivirus on my computers, tablets, and smart phones for many years. It updates frequently and automatically. It automatically scans and protects me from malicious web sites. It does what I need. What more could I ask? Daryl C. 4.5 “The well-known security specialist Avast received a total of three awards for its outstanding performance ... shows how reliably and securely the security software renders its service...” Awarded Top-Rated Product 2023by AV-Comparatives Get privacy and performance tips, straight from the experts Read more at Avast Academy What Is a Computer Virus and How Does It Work? A computer virus is designed to infect programs & files with malicious code, changing how a computer operates and spreading across systems. Learn more What Is a VPN & How Does It Work? A VPN is a secure, encrypted connection that protects your online privacy. Learn what VPNs are, how they work, and what they do to protect you. Learn more How to Increase Your Internet Speed Right Now Wondering why your internet speed is slow? Learn how to improve your internet connection right now, whether you're on Wi-Fi or Ethernet. Learn more What Is Hacking? We all have some concept of hacking - but do you really know what it is? Read our full hacking definition here. Learn more What Is the about:blank Page? How to Use or Remove It Ever seen the about:blank page while browsing? Discover what about:blank is, why it appears, how to remove it, and how to browse safely. Learn more How to Encrypt Email on Gmail, Outlook, iOS, Android, and Other Platforms Learn how to encrypt your emails using different email providers to keep your communications safe. Start using end-to-end encryption. Learn more How to Fix the Blue Screen of Death (BSOD) on Windows 10 and 11 Seeing the blue screen of death on your Windows computer can be devastating. Understand the causes and learn how to fix the BSOD. Learn more What Is a Swatting Incident and How Does Swatting Work? Swatting is an incident where a hoax call is made to the police. Find out how people get swatted and why gamers are targeted. Learn more Why Is My Battery Draining So Fast? Tips and Troubleshooting Guide Why is my phone dying so fast? Learn how to fix a draining battery and boost your phone's overall performance in this guide. Learn more Keeping people around the world safer & more secure Using real-time intelligence from hundreds of millions of Avast users, we prevent more than 66 million threats every day. Download Free Protection Download Free Protection Download freefrom Google Play Download freefrom the App Store Looks like you're using Windows Looks like you're using Mac Looks like you're using Android Looks like you're using iOS Would you like this app for Windows or Mac? Windows Mac Would you like this app for Windows or Android? Windows Android Would you like this app for Windows or iOS? Windows iOS Would you like this app for Mac or Windows? Mac Windows Would you like this app for Mac or Android? Mac Android Would you like this app for Mac or iOS? Mac iOS Would you like this app for Android or Windows? Android Windows Would you like this app for Android or Mac? Android Mac Would you like this app for Android or iOS? Android iOS Would you like this app for iOS or Windows? iOS Windows Would you like this app for iOS or Android? iOS Android Would you like this app for iOS or Mac? iOS Mac Windows Mac Download freefrom Google Play Download freefrom the App Store Windows Mac Download freefrom Google Play Download freefrom the App Store Back Newsletter India (English) For home Support Security Privacy Performance Blog Forum For business Business support Business products Business partners Business blog Affiliates For partners Mobile Carriers Company Contact Us Careers Press center Digital trust Technology Research Participation © 2024 Gen Digital Inc. All rights reserved. Privacy policy Products policy Legal Report vulnerability Contact security Modern Slavery Statement Do not sell my info Subscription details Close Almost done! Complete installation by clicking your downloaded file and following the instructions. Initiating download... Note: If your download did not start automatically, please click here. Note: click here Click this file to start installing Avast. Close", "accountId": "01a99d03-e306-4cef-8cf4-3836eed4b952", "source": "website", "title": "Avast | Download Free Antivirus & VPN | 100% Free & Easy", "scrapedAt": "2024-08-18T14:38:28.836Z" }, "id": "db44aa2b-1cea-5975-9344-c4b966ab8723", "collection": "WebPage" }, "errorMessage": "connection to: OpenAI API failed with status: 400 error: This model's maximum context length is 8192 tokens, however you requested 8225 tokens (8225 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", "msg": "Maximum context length error - aborting as we do not want to keep retrying the same large text" } ``` ### Server Version 1.25.10 ### Code of Conduct - [X] I have read and agree to the Weaviate's [Contributor Guide](https://weaviate.io/developers/contributor-guide) and [Code of Conduct](https://weaviate.io/service/code-of-conduct)

Sumat_Mallick · October 4, 2024, 11:27am

Thank you @DudaNogueira

After further debugging, I found the solution.

I have been using the free version of Weaviate for the last two months, by extending it for another two weeks.

When I move to the new Weaviate layer, it starts working fine.

New Code:

if not client.collections.exists(class_name):
        bot = client.collections.create(
            name=class_name,
            vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(  
                model="ada", 
                model_version="002",
                vectorize_collection_name = False
            ),
            properties=[
                wvc.config.Property(
                    name="text",
                    data_type=wvc.config.DataType.TEXT,  
                    vectorize_property_name=True  
                ),
                wvc.config.Property(
                    name="metadata",
                    data_type=wvc.config.DataType.TEXT,  
                    vectorize_property_name=True  
                )
            ]
        )

Old Code

try:
    bot = client.collections.create(
    name=class_name,
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai( 
        model="ada", 
        model_version="002",
        vectorize_collection_name = False
    ),
    properties=[
        wvc.config.Property(
            name="text",
            data_type=wvc.config.DataType.TEXT, 
            vectorize_property_name=True  
        ),
        wvc.config.Property(
            name="metadata",
            data_type=wvc.config.DataType.TEXT,  
            vectorize_property_name=False  
        ),
    ]
)

except Exception as e:
    print("Error:", e)
    print("--------*--------**")
    bot = client.collections.get(class_name)

DudaNogueira · October 7, 2024, 9:21am

hi @Sumat_Mallick !!

I didn’t understand.

Apart from the try/except the code seems similar to me.

Also, the only difference from the sandbox (“free version”) from a paid cluster are the resources.

The same Weaviate you get at a sandbox, is the same you get from a paid account and from our public releases.

Thanks!

Dirk · October 7, 2024, 9:38am

I think it is a bug in weaviate, but it has certain preconditions to happen - restarting wipes the state but it could reappear any time.

The latest version contains additional information in the error, which can help me to pin down what the actual reason is.

Sumat_Mallick · October 9, 2024, 11:04am

Hello @Dirk and @DudaNogueira,

I hope this message finds you well.

We are currently facing a challenge with our setup on an AWS Ubuntu instance. While we are able to connect to the Weaviate cloud from our local Windows machine, we encounter an error when attempting to do so on the Ubuntu instance. The error message we receive is as follows:

[2024-10-09 10:56:37 +0000] [445826] [ERROR] Worker (pid:445830) was sent SIGKILL! Perhaps out of memory?

To replicate this issue, please create a file named vec.py and add the relevant code to it. You can then use the following command to check if it works:

gunicorn --workers 4 --worker-class gevent --bind 0.0.0.0:8000 --timeout 120 vec:app

Any insights or assistance you can provide on this matter would be greatly appreciated.

Thank you!

DudaNogueira · October 14, 2024, 9:53pm

Hi @Sumat_Mallick !

This seems to be code before weaviate client receiving this SIGKILL. So not sure if Weaviate has any effect on this.

Have you tried running with more verbose logs from gunicorn?

fellalli · December 17, 2024, 1:38pm

Hi @DudaNogueira,

Thanks for looking at this! I assume that this is the same issue as here. I have the same problem when trying to use Weaviate v4 with gevent or eventlet to perform a simple search query. The logs do not seem to give any more information. But the problem seems to be well reproducible.

Dirk · December 18, 2024, 5:45am

I answered in the linked thread!

Topic		Replies	Views
Openai Vectorizer failing to reach embeddings endpoint Support	11	883	May 6, 2025
Errors: text too long for vectorization. Tokens for text: 10440, max tokens per batch: 8192, ApiKey absolute token limit: 1000000' Support bug	12	291	November 1, 2024
Weaviate Batch Errors during Batch Insertion with v4 client Support bug , developer-experience , wcs , python , documentation	11	1276	May 15, 2024
Error "text too long for vectorization" Support javascript , azure	8	402	June 5, 2024
Facing maximum context length exceed issue during vectorizing Support python	1	369	April 16, 2024

Error : text too long for vectorization

Description

Server Setup Information

Any additional Information

Related topics