Text field must be smaller than 1024 characters

jrkoh · June 8, 2024, 4:00pm

Hi everyone

I’m attempting to use the mutli2vec vectorizer to handle my objs which some contains “image_base64”. Below is my code, and most of my objects can be successfully pushed to weaviate, but I do encounter the following error for some of my objects. The failed objects are all objects that contains the “image_base64” attribute, but only 7 out of 9 of them failed.

Code:

client.collections.create(
            "dev",
            properties=[
                Property(name="text_as_html", data_type=DataType.TEXT),
                Property(name="text", data_type=DataType.TEXT),
                Property(name="image_base64", data_type=DataType.BLOB),
            ],
            vectorizer_config=[
                Configure.NamedVectors.multi2vec_palm(
                    name="default",
                    # Define the fields to be used for the vectorization - using image_fields, text_fields, video_fields
                    image_fields=[
                        Multi2VecField(name="image_base64")
                    ],
                    text_fields=[
                        Multi2VecField(name="text_as_html"),
                        Multi2VecField(name="text"),
                    ],
                )
            ],
            # Additional parameters not shown
        )

Error messages:
Failed to import object with error: connection to Google failed with status: 400 error: Multimodal embedding failed with the following error: Text field must be smaller than 1024 characters.

Here are the things i’ve tried:

Removing all text_fields, ie do not vectorize a single text_field, but still throws the same error
Removing the Property(name="image_base64", data_type=DataType.BLOB) but it still throws the error.

What am I missing?

DudaNogueira · June 10, 2024, 5:24pm

hi @jrkoh !!

This seems a limit set by Google Palm.

I have found this link that seems to state this exact info (but tokens, no charts )

Let me know if this helps.

Thanks!

jrkoh · June 11, 2024, 7:29am

This is odd. Even if the accepted input token is 1024 tokens (doesn’t matter is it tokens or characters), how does one explain the fact that even if I leave text_fields empty, I still encounter the error?

DudaNogueira · June 11, 2024, 7:46pm

Well… that’s not expected.

Here is some code to I putted together:

from weaviate.classes.config import Property, DataType, Configure, Multi2VecField
client.collections.delete_all()
collection = client.collections.create(
            "Test",
            properties=[
                Property(name="text_as_html", data_type=DataType.TEXT),
                Property(name="text", data_type=DataType.TEXT),
                Property(name="image_base64", data_type=DataType.BLOB),
            ],
            vectorizer_config=[
                Configure.NamedVectors.multi2vec_palm(
                    project_id="duda-lab",
                    location="us-central1",
                    name="default",
                    # Define the fields to be used for the vectorization - using image_fields, text_fields, video_fields
                    image_fields=[
                        Multi2VecField(name="image_base64")
                    ],
                    text_fields=[
                        Multi2VecField(name="text_as_html"),
                        Multi2VecField(name="text"),
                    ],
                )
            ],
            # Additional parameters not shown
        )

#and now some importing
import base64
with open("example.jpg", "rb") as f:
    encoded_image = base64.b64encode(f.read())
    print(encoded_image.decode("utf-8"))
    collection.data.insert({"text_as_html": "<b>Hello</b>", "text": "something", "image_base64": encoded_image})

can you isolate the objects that are having this issue? maybe they are too big?

Let me know if this helps.

Topic		Replies	Views
Error : text too long for vectorization Support python , technical	9	344	December 18, 2024
Errors: text too long for vectorization. Tokens for text: 10440, max tokens per batch: 8192, ApiKey absolute token limit: 1000000' Support bug	12	305	November 1, 2024
Error "text too long for vectorization" Support javascript , azure	8	415	June 5, 2024
Why am I getting a malformed vector error when trying to add text metadata? Support python	6	428	July 1, 2024
GRPC Resource Exhausted Error Support	2	690	January 20, 2025

Text field must be smaller than 1024 characters

Related topics