Hybrid Queries on new OpenAI Embedding Models failing server restart

@DudaNogueira I thought I would create a new thread for this issue rather than hijacking New OpenAI Embedding Models - #21 by SomebodySysop

The problem statement is that once a weaviate server is configured with an OpenAI vectorizer using the new model of text-embedding-3-large and dimensions of 1024, hybrid queries fails with a vector search: vector lengths don't match: 1024 vs 3072 error message upon a server reboot.

I was able to replicate this issue on codesandbox. This is using Weaviate v1.23.10 and python client 4.4.4.

Steps to reproduce

  1. https://codesandbox.io/p/sandbox/interesting-morse-hgvggd
  2. Sign-in using SSO of choice
  3. Open up setup.py and query.py and update line 16 with an OpenAI API Key. As this is being done codesandbox will “seamlessly fork” to your own private sandbox. If the URL does not change, you may have to go back to the dashboard CodeSandbox, go to My drafts, and open the newly created sandbox.
  4. Go to top left corner and select the “Restart Devbox” option. This should trigger sandbox initialization. Wait for container to be started and the pip -r requirements.txt job to complete.
  5. Open up a new terminal in the center bottom pane.
  6. Run the following in sequence:
  • docker compose down -v

  • docker compose up -d

  • python setup.py

  • python query.py

    Note the following:

    1. setup.py creates a collection and inserts a single object
    2. The single object we stored in weaviate has a vector length of 1024, indicating vectorizer is working properly
    3. We can fetch that object from weaviate, confirming that the inserted object is persisted
    4. We can hybrid query from weaviate
  1. Now run:
  • docker compose restart
  • python query.py

All we’ve done here is restart the weaviate container. Notice now that we can still fetch the inserted object (see output above the exception output), but now hybrid query fails with a vector length not matching error.

Hi @D3x !

Thanks for reporting.

I will try to reproduce this on my end and get back to you!

Hi @DudaNogueira were you able to reproduce this given the instructions?

Hi! Sorry, I couldn’t get to it yet.

have you tried running this locally?

Those sandboxes usually has a lot of limitations that may affect it, so removing that component may give us a hint if the issue is on there on in the server.

@DudaNogueira yes this is reproducible locally.

The same behaviors as I noted above persists. A simple server restart makes hybrid queries fail which seems like a fairly serious problem. Would appreciate your team’s attention on this asap.

Hi D3x!

Sorry for the delay here.

I was not able to reproduce this:

❯ python3 setup.py
UUID for new object created: 117a7993-a2aa-4847-9bd2-f69cbdac1160
fetch_objects: 117a7993-a2aa-4847-9bd2-f69cbdac1160 (1024) | Properties: {‘text’: ‘Some data’}
hybrid query: 117a7993-a2aa-4847-9bd2-f69cbdac1160 (1024) | Properties: {‘text’: ‘Some data’}
❯ python3 query.py
fetch_objects: 117a7993-a2aa-4847-9bd2-f69cbdac1160 (1024) | Properties: {‘text’: ‘Some data’}
hybrid query: 117a7993-a2aa-4847-9bd2-f69cbdac1160 (1024) | Properties: {‘text’: ‘Some data’}

Could we connect in Slack so I can take a closer look?

Thanks!

Hi @DudaNogueira

I’ve recently looked into upgrading our local setup to 1.28 but when validating it failed the same Hybrid Query issue again. I recall it was resolved with your help on Slack, but I’m unable to view older messages to verify.

I’ve refreshed the demo repo to reproduce the issue: GitHub - d3xtemp/weaviate-issue. The local weaviate instance was initialized exactly as specified in the Weaviate docs Docker | Weaviate. Also, the issue now does not require a server restart to be demonstrated.

Again, a quick explanation of the issue is that I’ve used OpenAI’s text-embedding-3-large embedding model with a dimension of 1024 to create a collection. When I simply fetch objects from this collection, I can verify that these objects have a vector length of 1024 as expected. However, when I attempt hybrid queries against this collection, I receive the error message below.

Error: Query call with protocol GRPC search failed with message <AioRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "explorer: get class: vector search: object vector search at index mycollection: shard mycollection_66Yf5V7XYzHQ: vector search: knn search: distance between entrypoint and query node: 1024 vs 1536: vector lengths don't match"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"explorer: get class: vector search: object vector search at index mycollection: shard mycollection_66Yf5V7XYzHQ: vector search: knn search: distance between entrypoint and query node: 1024 vs 1536: vector lengths don\'t match", grpc_status:2, created_time:"2024-12-17T15:29:04.353870673-08:00"}"

Your help to confirm this issue and orchestrate a fix is appreciated.

1 Like

hi @D3x !!

Welcome back :slight_smile:

You probably have your vectors stored with one dimensionality, and have the vectorizer of your collection configured to use different one.

You can get the collection configuration and check that:

collection.config.get().vectorizer_config

The solution here is to create a second collection (or on a different server), specifying the exact model and dimensions of your vectorized data, and migrate your data over.

There is a fairly easy migration guide here:

Let me know if this helps.

Thanks!

Hi @DudaNogueira , I’m unclear what you’re suggesting.

The repo I provided you demonstrates the problem in a fresh instance of weaviate, creates a collection from scratch, inserts a few records, and then attempts to hybrid query. No migration of data is needed to demonstrate the issue.

In weaviate-issue/setup.py at 7a9bfdd08791a33daad96c074c7fc2e90779c9a5 · d3xtemp/weaviate-issue · GitHub I’ve configured the vectorizer simply and yes with one dimensionality only. In this simple case, shouldn’t I be able be hybrid query without issue whenever I use a reference to that same collection (i.e. client.collections.get(collection_name).query.hybrid())?

Oh Right! sorry! completely missed the repo

This is indeed a :bug: bug

:grimacing:

for some reason, in this scenario, it is using the default module configuration.

This is the payload it will send:

client = weaviate.connect_to_local(
    headers={
         "X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY", "CHANGE_ME"),
         "X-OpenAI-BaseUrl": "https://webhook.site/beef60de-4d45-4c61-9928-b20fa619f91e",
    }
)
collection = client.collections.get("Test")

response = collection.query.hybrid(
    query="hybrid query with 1024 dimensions",
    alpha=0.75,
    limit=5,
    include_vector=True
)
for obj in response.objects:
    print(
        f"hybrid query: {obj.uuid} ({len(obj.vector['default'])}) | Properties: {obj.properties}")
    
# we get this payload
payload = {
  "input": [
    "hybrid query with 1024 dimensions"
  ],
  "model": "text-embedding-3-small",
  "dimensions": 1536
}

I have raised it internally.

Thanks you very much :love_you_gesture: for raising this here.

Thanks for confirming the issue!

Assuming that we have no visibility into the status of the internal issues, I would appreciate it if you can provide an update when it’s resolved and which upcoming versions would contain the fix. We are eager to stay on top of the releases.

:pray:

Sure. Our team is already looking into this.

As soon as they confirm, I’ll open a github issue so we can keep track of it.

I’ll update it here.

Thanks!

hi @D3x !!

The issue is this one:

1 Like

Any updates on this? Issue seems to be on server version 1.28 and in weaviate cloud it’s not possible to select a prior version

Hello @DudaNogueira,

What is the status or the bug fix?
We are attempting to deploy our application using the paid serverless option, but the server version is locked to 1.28.2 (we can’t use the 1.27.0 we used to test locally) and are therefore unable to deploy anything!!

Please let me know if you have any suggestion to go around this issue, even if its temporary.

hi there @Jose-Coutinho_cmore !! Welcome to our community :hugs:

This seems like a popular issue :grimacing:

I have pinged our team again so we can prio this.

Thanks!

1 Like