Error "text too long for vectorization"

Description

I followed official “Quickstart”.
I encountered error below when I execute code “Partial recap”.

text too long for vectorization

I use azure openai.
No edit in docker-compose.yaml

Server Setup Information

  • Weaviate Server Version: 1.25.1

  • Deployment Method: docker

  • Multi Node? Number of Running Nodes: 1

  • Client Language and Version: nodoe v18.20.3

Any additional Information

class object setting

const classObj = {
  'class': 'Question',
  'vectorizer': 'text2vec-openai',  // If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
  'moduleConfig': {
    'text2vec-openai': {
        model: 'ada',
        modelVersion: '002',
        type: 'text',
        resourceName: '<resourceName>',
        deploymentId: '<deploymentId>'
     },
    'generative-openai': {}  // Ensure the `generative-openai` module is used for generative queries
  },
};

As for the azure configuration, I refer to the official docs.

I also tried “const batchSize = 1”, but same error occured.

1 Like

Resolved by downgrading docker image version.
From 1.25.1 to 1.23.1

Is the latest version of docker image not compatible with Azure?

Hey, that should work - is your api key maybe “used up”? Eg no further tokens are available?

Same error here. Using azure open ai ada 2. Same ingestion working with weaviate 1.24.x.

Server Setup Information

  • Weaviate Server Version: 1.25.1
  • Deployment Method: docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: python 4.6.2

Any additional Information

Docker config:

weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.25.1
    ports:
      - 8080:8080
      - 50051:50051
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    hostname: weaviate
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-cohere,text2vec-huggingface,text2vec-palm,text2vec-openai,generative-openai,generative-cohere,generative-palm,ref2vec-centroid,reranker-cohere,qna-openai'
      CLUSTER_HOSTNAME: 'node1'
      LIMIT_RESOURCES: true
    deploy:
      resources:
        limits:
          memory: 2500M

Vectorizer is define during collection creation with this code:

vectorizer_config=Configure.Vectorizer.text2vec_azure_openai(
                resource_name=os.getenv("AZURE_RESOURCE_NAME"),
                deployment_id=os.getenv("AZURE_DEPLOYMENT_ID"),
                base_url=os.getenv("AZURE_BASE"),
            )

Exemple of the error:

ErrorObject(message='text too long for vectorization', object_=_BatchObject(collection='Local_demo_documents', vector=None, uuid='5eefd335-fe2f-4a65-a915-246758ec23d1', properties={'title': '19 techniques de vente pour vendre plus et mieux en entretien commercial', 'content': '�est d’ailleurs pour cela que l’on parle de technique de vente. Au delà des aspects psychologiques à comprendre, il y a bien sûr des compétences techniques à apprendre et à maîtriser pour mieux vendre au quotidien. Nous avons recensé pour vous les techniques de ventes qui font partie des fondamentaux commerciaux que vous devriez absolument connaître + d’autres techniques commerciales bonus pour muscler votre jeu de jambes. Que vous soyez commercial, manager commercial, entrepreneur ou chef d’entreprise… ces techniques de ventes sont vitales. Car, comme le dit la citation : « Celui qui a une bonne idée mais qui ne sait', 'source': 'https://google.com', 'source_type': 'mhtml', 'createdAt': datetime.datetime(2024, 5, 28, 8, 12, 40, 697473, tzinfo=datetime.timezone.utc)}, tenant=None, references=None, retry_count=0), original_uuid='5eefd335-fe2f-4a65-a915-246758ec23d1')

Ingestion code:

with collection.batch.rate_limit(requests_per_minute=1440) as batch:
        for data in contents:
            batch.add_object(properties=data)

Could you please test this image and let me know if it works?
semitechnologies/weaviate:preview-fix-azure-openai-f9daa1e

Thank you, @Dirk ! It worked perfectly on the first try!

thanks for confirming, will be in 1.25.2

2 Likes

I confirmed that I have not used up tokens yet.
I will try 1.25.2.