The problem statement is that once a weaviate server is configured with an OpenAI vectorizer using the new model of text-embedding-3-large and dimensions of 1024, hybrid queries fails with a vector search: vector lengths don't match: 1024 vs 3072 error message upon a server reboot.
I was able to replicate this issue on codesandbox. This is using Weaviate v1.23.10 and python client 4.4.4.
Open up setup.py and query.py and update line 16 with an OpenAI API Key. As this is being done codesandbox will “seamlessly fork” to your own private sandbox. If the URL does not change, you may have to go back to the dashboard CodeSandbox, go to My drafts, and open the newly created sandbox.
Go to top left corner and select the “Restart Devbox” option. This should trigger sandbox initialization. Wait for container to be started and the pip -r requirements.txt job to complete.
Open up a new terminal in the center bottom pane.
Run the following in sequence:
docker compose down -v
docker compose up -d
python setup.py
python query.py
Note the following:
setup.py creates a collection and inserts a single object
The single object we stored in weaviate has a vector length of 1024, indicating vectorizer is working properly
We can fetch that object from weaviate, confirming that the inserted object is persisted
We can hybrid query from weaviate
Now run:
docker compose restart
python query.py
All we’ve done here is restart the weaviate container. Notice now that we can still fetch the inserted object (see output above the exception output), but now hybrid query fails with a vector length not matching error.
Those sandboxes usually has a lot of limitations that may affect it, so removing that component may give us a hint if the issue is on there on in the server.
The same behaviors as I noted above persists. A simple server restart makes hybrid queries fail which seems like a fairly serious problem. Would appreciate your team’s attention on this asap.
I’ve recently looked into upgrading our local setup to 1.28 but when validating it failed the same Hybrid Query issue again. I recall it was resolved with your help on Slack, but I’m unable to view older messages to verify.
I’ve refreshed the demo repo to reproduce the issue: GitHub - d3xtemp/weaviate-issue. The local weaviate instance was initialized exactly as specified in the Weaviate docs Docker | Weaviate. Also, the issue now does not require a server restart to be demonstrated.
Again, a quick explanation of the issue is that I’ve used OpenAI’s text-embedding-3-large embedding model with a dimension of 1024 to create a collection. When I simply fetch objects from this collection, I can verify that these objects have a vector length of 1024 as expected. However, when I attempt hybrid queries against this collection, I receive the error message below.
Error: Query call with protocol GRPC search failed with message <AioRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "explorer: get class: vector search: object vector search at index mycollection: shard mycollection_66Yf5V7XYzHQ: vector search: knn search: distance between entrypoint and query node: 1024 vs 1536: vector lengths don't match"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"explorer: get class: vector search: object vector search at index mycollection: shard mycollection_66Yf5V7XYzHQ: vector search: knn search: distance between entrypoint and query node: 1024 vs 1536: vector lengths don\'t match", grpc_status:2, created_time:"2024-12-17T15:29:04.353870673-08:00"}"
Your help to confirm this issue and orchestrate a fix is appreciated.
You probably have your vectors stored with one dimensionality, and have the vectorizer of your collection configured to use different one.
You can get the collection configuration and check that:
collection.config.get().vectorizer_config
The solution here is to create a second collection (or on a different server), specifying the exact model and dimensions of your vectorized data, and migrate your data over.
Hi @DudaNogueira , I’m unclear what you’re suggesting.
The repo I provided you demonstrates the problem in a fresh instance of weaviate, creates a collection from scratch, inserts a few records, and then attempts to hybrid query. No migration of data is needed to demonstrate the issue.
Assuming that we have no visibility into the status of the internal issues, I would appreciate it if you can provide an update when it’s resolved and which upcoming versions would contain the fix. We are eager to stay on top of the releases.
What is the status or the bug fix?
We are attempting to deploy our application using the paid serverless option, but the server version is locked to 1.28.2 (we can’t use the 1.27.0 we used to test locally) and are therefore unable to deploy anything!!
Please let me know if you have any suggestion to go around this issue, even if its temporary.