WeaviateQueryError max_tokens is too large with generative search and gpt-4-1106-preview

steveHimself · June 21, 2024, 11:47am

Hi!

I have a problem, which I can’t seem to figure out.
I created a schema with the following config:

{
    "class": "article",
    "descripition": "Article collection",
    "properties": [
        {
            "indexFilterable": false,
            "indexSearchable": false,
            "name": "dirId",
            "dataType": [
                "string"
            ],
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": true
                },
                "qna-openai": {
                    "skip": true
                },
                "generative-openai": {
                    "skip": true
                }
            }
        },
        {
            "name": "title",
            "dataType": [
                "text"
            ],
            "description": "Article Title",
            "moduleConfig": {
                "text2vec-openai": {
                    "model": "text-embedding-3-large",
                    "dimensions": 1024
                },
                "qna-openai": {
                    "model": "gpt-3.5-turbo-instruct"
                }
            }
        },
        {
            "name": "content",
            "dataType": [
                "text"
            ],
            "description": "Article Content",
            "indexFilterable": true,
            "indexSearchable": true,
            "moduleConfig": {
                "text2vec-openai": {
                    "model": "text-embedding-3-large",
                    "dimensions": 1024
                },
                "qna-openai": {
                    "model": "gpt-3.5-turbo-instruct"
                }
            }
        },
        {
            "name": "url",
            "dataType": [
                "string"
            ],
            "description": "Article URL",
            "indexFilterable": true,
            "indexSearchable": false,
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": true
                },
                "qna-openai": {
                    "skip": true
                },
                "generative-openai": {
                    "skip": true
                }
            }
        },
        {
            "name": "language",
            "dataType": [
                "string"
            ],
            "description": "Article Language"
        }
    ],
    "moduleConfig": {
        "generative-openai": {
            "model": "gpt-4-1106-preview",
            "maxTokensProperty": 4096
        },
        "text2vec-openai": {
            "model": "text-embedding-3-large",
            "dimensions": 1024
        },
        "qna-openai": {
            "model": "gpt-3.5-turbo-instruct"
        }
    }
}

During Import of articles I made sure I chunk the articles to stay within the token limit for the vectorization of 8000 token as estimated by the tiktoken npm package with the WASM bindings.

I imported 100 articles as a test which came out to 101 chunks in total. Meaning only 1 article exceeded the 8k chunk border.

Now If I try to run a simple groupedTask test to generate some responses across multiple articles I run into a token limit. This is the error I receive:

WeaviateQueryError: Query call with protocol gRPC failed with message: /weaviate.v1.Weaviate/Search UNKNOWN: connection to: OpenAI API failed with status: 400 error: max_tokens is too large: 10189. This model supports at most 4096 completion tokens, whereas you provided 10189.

This is the code I used to test the generative Search function:

const collection = client.collections.get('Article');

const result = await collection.generate.nearText(search, {
    groupedTask: prompt,
    groupedProperties: ['title', 'content'],
}, {
    autoLimit: 2,
});

I tried using limit vs autoLimit, tried setting it to 1 instead of 2 (altough I should be able to use somewhere around 16 articles as input, right? (128k input window for gpt-4-turbo)

The error suggests to me, that I need to set the max_tokens somewhere else but I tried to do it during the creation of my class.

Could anybody point me in the right direction?

Thanks!
Steve

DudaNogueira · June 21, 2024, 1:32pm

hi @steveHimself ! Welcome to our community

This error message comes directly from OpenAi.

One thing you could do is intercepting they payload that should go to openai by replacing the base url on query time:

So you do a vector search (to avoid triggering the vectorization) and it will give you the exact payload sent to OpenAi.

Let me know if this helps!

Thanks!

steveHimself · June 24, 2024, 10:58am

But isn’t this regarding the vectorization and not the generation?
The vectorization should work fine. My problem seems to be with the generative search.

And how would I set custom headers to replace the Base URL with the typescript client?

I can’t really find anything about that.

Thanks!

steveHimself · June 24, 2024, 12:56pm

Update:

I created a small little bun server on port 4000 on my localhost.

I then initialized the old weaviate-ts-client like this:

import weaviateAlt from 'weaviate-ts-client';

....
const altClient = await weaviateAlt.client({
    scheme: 'http',
    host: 'weaviate:8080',
    headers: {
        "X-OpenAI-Api-Key": process.env.OPENAI_API_KEY || '',
        "X-OpenAI-BaseURL": "http://host.docker.internal:4000"
    }
});

const altResult = altClient.graphql
    .get()
    .withClassName('Article')
    .withNearText({
        concepts: [search]
    })
    .withGenerate({
        groupedTask: proompt,
    })
    .withLimit(LIMIT) // This is 50 at the moment to test
    .withFields('title content id')
    .do();
}

On my Bun server the situation looks like this:

Bun.serve({
    port: 4000,
    async fetch(req) {
        console.log({ receivedBodyOutside: { req: req, body: await req.json() } });
        const url = new URL(req.url);

        if (url.pathname === '/') {
            console.log({ receivedBody: req.body });
            return new Response('Hello Bun!');
        }

        return new Response('Not found');
    }
})

And finally this is the result that is being logged:

{
  receivedBodyOutside: {
    req: Request (0 KB) {
      method: "POST",
      url: "http://host.docker.internal:4000/v1/embeddings",
      headers: Headers {
        "host": "host.docker.internal:4000",
        "user-agent": "Go-http-client/1.1",
        "content-length": "78",
        "authorization": "****",
        "content-type": "application/json",
        "accept-encoding": "gzip",
      }
    },
    body: {
      input: [ "<my-search-term>" ],
      model: "text-embedding-3-large",
      dimensions: 1024,
    },
  },
}

Couple of questions:

Am I right in thinking, that this is just the payload for the vectorization of the search term? Can I skip this? You mentioned I should “just” do a vector search to skip vectorization. Do you mean I should use nearVector() instead of nearText()?
Is this what you meant with “intercepting the OpenAI Calls”?

Thank you for your help @DudaNogueira

DudaNogueira · June 24, 2024, 6:09pm

Hi!

That’s right.

Weaviate will first vectorize your query (if using neartext or hybrid without a vector).

So in order to avoid triggering the vectorization of your generative search, you need to use a nearVector (provide your query vector) or a nearObject. (provide an object UUID of the same collection)

Let me know if this helps!

Thanks!

steveHimself · July 25, 2024, 8:11am

I finally came around to testing this more.
I still couln’t make any progress.

This is my setup at the moment:

This is how I call weaviate to generate results based on search + prompt:

const altClient = await weaviateAlt.client({
        scheme: 'http',
        host: 'weaviate:8080',
        headers: {
            "X-OpenAI-Api-Key": process.env.OPENAI_API_KEY || '',
            "X-OpenAI-BaseURL": "http://host.docker.internal:4000"
        }
    });

 const altResult = altClient.graphql
        .get()
        .withClassName('Article')
        // .withNearText({
        //     concepts: [search]
        // })
        .withNearVector({
            vector: [...],
        })
        .withFields('title content')
        .withGenerate({
            singlePrompt: proompt,
        })
        .withLimit(LIMIT) // moved this down to 2
        .do();

With this setup I still get the same response in my bun console log output:

{
  receivedBodyOutside: {
    req: Request (0 KB) {
      method: "POST",
      url: "http://host.docker.internal:4000/v1/embeddings",
      headers: Headers {
        "host": "host.docker.internal:4000",
        "user-agent": "Go-http-client/1.1",
        "content-length": "94",
        "authorization": "Bearer sk-proj-CUjrXQPfsox5hc2GM03oT3BlbkFJM3w1UfyT59c5lURlcWsK",
        "content-type": "application/json",
        "accept-encoding": "gzip",
      }
    },
    body: {
      input: [ "<my-search-query>" ],
      model: "text-embedding-3-large",
      dimensions: 1024,
    },
  },
}

I’ve now also made sure to split each of my articles at 8k tokens because I got a smilar error while batch importing.
The import now runs through so the splitting must have solved the issue there.

Hence, I believe the content length should not be the problem?

And given the output in my bun server I still only seem to get a vectorization request even though I’m using withNearVector() instead of withNearText()

Am I doing something critically wrong here? And could you point me in the right direction?

Thanks for your help so far!

Topic		Replies	Views
Quickstart tutorial generative search exception on max_token using Python Support python , technical	4	344	October 16, 2024
OpenAI GPT4 use with the generate.near_text() function Support	2	889	January 12, 2024
Errors: text too long for vectorization. Tokens for text: 10440, max tokens per batch: 8192, ApiKey absolute token limit: 1000000' Support bug	12	668	November 1, 2024
Weaviate Openai Embedding Models General	8	817	August 23, 2024
Recommendations for free ML models of Weaviate text2vec-transformers for Semantic Search purposes? Support	5	1081	November 10, 2023

WeaviateQueryError max_tokens is too large with generative search and gpt-4-1106-preview

Related topics