Text field must be smaller than 1024 characters

Hi everyone

I’m attempting to use the mutli2vec vectorizer to handle my objs which some contains “image_base64”. Below is my code, and most of my objects can be successfully pushed to weaviate, but I do encounter the following error for some of my objects. The failed objects are all objects that contains the “image_base64” attribute, but only 7 out of 9 of them failed.

Code:

client.collections.create(
            "dev",
            properties=[
                Property(name="text_as_html", data_type=DataType.TEXT),
                Property(name="text", data_type=DataType.TEXT),
                Property(name="image_base64", data_type=DataType.BLOB),
            ],
            vectorizer_config=[
                Configure.NamedVectors.multi2vec_palm(
                    name="default",
                    # Define the fields to be used for the vectorization - using image_fields, text_fields, video_fields
                    image_fields=[
                        Multi2VecField(name="image_base64")
                    ],
                    text_fields=[
                        Multi2VecField(name="text_as_html"),
                        Multi2VecField(name="text"),
                    ],
                )
            ],
            # Additional parameters not shown
        )

Error messages:
Failed to import object with error: connection to Google failed with status: 400 error: Multimodal embedding failed with the following error: Text field must be smaller than 1024 characters.

Here are the things i’ve tried:

  1. Removing all text_fields, ie do not vectorize a single text_field, but still throws the same error
  2. Removing the Property(name="image_base64", data_type=DataType.BLOB) but it still throws the error.

What am I missing?

hi @jrkoh !!

This seems a limit set by Google Palm.

I have found this link that seems to state this exact info (but tokens, no charts :thinking: )

Let me know if this helps.

Thanks!

This is odd. Even if the accepted input token is 1024 tokens (doesn’t matter is it tokens or characters), how does one explain the fact that even if I leave text_fields empty, I still encounter the error?

Well… that’s not expected. :thinking:

Here is some code to I putted together:

from weaviate.classes.config import Property, DataType, Configure, Multi2VecField
client.collections.delete_all()
collection = client.collections.create(
            "Test",
            properties=[
                Property(name="text_as_html", data_type=DataType.TEXT),
                Property(name="text", data_type=DataType.TEXT),
                Property(name="image_base64", data_type=DataType.BLOB),
            ],
            vectorizer_config=[
                Configure.NamedVectors.multi2vec_palm(
                    project_id="duda-lab",
                    location="us-central1",
                    name="default",
                    # Define the fields to be used for the vectorization - using image_fields, text_fields, video_fields
                    image_fields=[
                        Multi2VecField(name="image_base64")
                    ],
                    text_fields=[
                        Multi2VecField(name="text_as_html"),
                        Multi2VecField(name="text"),
                    ],
                )
            ],
            # Additional parameters not shown
        )

#and now some importing
import base64
with open("example.jpg", "rb") as f:
    encoded_image = base64.b64encode(f.read())
    print(encoded_image.decode("utf-8"))
    collection.data.insert({"text_as_html": "<b>Hello</b>", "text": "something", "image_base64": encoded_image})

can you isolate the objects that are having this issue? maybe they are too big?

Let me know if this helps.