Help Needed: Resolving WeaviateQueryError with Nil or Zero-Length Vector at docID 715

Description

I am experiencing a WeaviateQueryError related to a nil or zero-length vector at docID 715 during a vector search query in Weaviate. This issue is preventing successful queries within my menuitemembeddings index. The error specifically mentions a failure with a message about the vector search at index menuitemembeddings, indicating

a “nil or zero-length vector at docID 715”.

Full Log Output

_InactiveRpcError Traceback (most recent call last)
File ~/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/weaviate/collections/grpc/query.py:609, in _QueryGRPC.__call(self, request)
608 res: search_get_pb2.SearchReply # According to PEP-0526
→ 609 res, _ = self._connection.grpc_stub.Search.with_call(
610 request,
611 metadata=self._connection.grpc_headers(),
612 timeout=self._connection.timeout_config.query,
613 )
615 return res

File ~/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/grpc/_channel.py:1193, in _UnaryUnaryMultiCallable.with_call(self, request, timeout, metadata, credentials, wait_for_ready, compression)
1187 (
1188 state,
1189 call,
1190 ) = self._blocking(
1191 request, timeout, metadata, credentials, wait_for_ready, compression
1192 )
→ 1193 return _end_unary_response_blocking(state, call, True, None)

File ~/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/grpc/_channel.py:1005, in _end_unary_response_blocking(state, call, with_call, deadline)
1004 else:
→ 1005 raise _InactiveRpcError(state)

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

615 return res
617 except grpc.RpcError as e:
→ 618 raise WeaviateQueryError(e.details(), “GRPC search”)

WeaviateQueryError: Query call with protocol GRPC search failed with message explorer: get class: vector search: object vector search at index menuitemembeddings: shard menuitemembeddings_fkq12e2IPcaZ: vector search: knn search: distance between entrypoint and query node: got a nil or zero-length vector at docID 715.

I’m seeking guidance on how to diagnose and resolve this issue, particularly how to investigate the problematic docID 715 and strategies for cleansing or recovering the database to avoid similar errors in the future.

Server Setup Information

Any additional Information

I’m working in a local Docker setup with a “bring your own vectors” configuration using multi-vectors. The configuration involves various named vectors, including “menu_item_embedding”, “name_embedding”, and “description_embedding”. Here is a snippet of my configuration:

vectorizer_config=[
    wvcc.Configure.NamedVectors.none(name="menu_item_embedding"),
    wvcc.Configure.NamedVectors.none(name="name_embedding"),
    wvcc.Configure.NamedVectors.none(name="description_embedding"),
],
  • It also seems like these aren’t the only docIDs which suffer from this issue.
  • The dimensionality of these particular entries (and surrounding entries) for all three vectors seem to be 1024.
  1. What does the error actually mean? Is it related to how the vector is stored or how the distance is calculated?
  2. How can I can investigate further and cleanse my index of these probematic vectors?
  3. Hints to detect this prior to insertion would be helpful in the future

:pray::pray:Thank you if you read this far :pray::pray:

Hi! Do you see any outstanding logs in the server side?

Also, how big is this dataset?

Did you have any issues in ingestion? This could be a corrupted index.

Have you tried reindexing?

Hi @DudaNogueira
Thank you for taking the time to respond! Sorry if my answers are inaccurate or a bit long-winded. I’m still quite new to this tech.

Hi! Do you see any outstanding logs in the server side?

It’s hard for me to say for certain, but nothing at the time of querying.
Occasionally I have some WSL2 issues and need to kill docker etc… Perhaps this is a clue in the logs to some indexing error?

{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_menu_category_id_searchable/segment-1712350772127760700","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}
{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_menu_item_id/segment-1712350783095419100","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}
{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_ref_product_id_searchable/segment-1712350783062882400","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}
{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_data/segment-1712350793589253900","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}
{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_menu_product_id_searchable/segment-1712350783805691800","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}
{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_menu_item_id_searchable/segment-1712350783658625000","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}
{"action":"lsm_recover_from_active_wal","class":"MenuItemEmbeddings","index":"menuitemembeddings","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/menuitemembeddings/WSp0ViJvJhT2/lsm/property_data_searchable/segment-1712350774067835900","shard":"WSp0ViJvJhT2","time":"2024-04-05T22:06:09Z"}

Also, how big is this dataset?

~320k entries
~383mb on disk
image

Did you have any issues in ingestion? This could be a corrupted index.

I’d like to know more about this. How could I potentially identify a corrupt index more specifically?

Have you tried reindexing?

Both times I’ve tried to index, the process has needed to be restarted. Here is my rough code for skipping forward to the latest index item.

What strikes me as odd is the particularly low docID 715. Instinctively, I would assume this to be some sort of internal sequence number and then a failure at the 715th document in the indexing process would make sense to me.

# Perform insertion in batches
for i in tqdm(range(0, len(import_data), BATCH_SIZE)):
    batch = import_data[i:i+BATCH_SIZE]
    menu_item_embeddings = []


    for data in batch:

        # Check if the menu item already exists in the collection
        exists = menu_item_embeddings_collection.query.fetch_objects(
            filters=wvc.query.Filter.by_property("menu_item_id").equal(data["id"])
        )

        if len(exists.objects) > 0:
            skip_count += 1
            continue

        # Transform the existing object into the Weaviate format
        weaviate_menu_item = wvc.data.DataObject(
            properties={
                "menu_item_id": data["id"],
                "menu_product_id": data["menu_product_id"],
                "menu_id": data["menu_id"],
                "menu_category_id": data["menu_category_id"],
                "ref_product_id": data["ref_product_id"],
                "data": data["data"],
            },
            vector={
                "name_embedding": from_bytes_to_list(data["name_embedding"]),
                "description_embedding": from_bytes_to_list(data["description_embedding"]),
                "menu_item_embedding": from_bytes_to_list(data["menu_item_embedding"])
            }
        )

        # Ensure the vectors are non-empty
        # TODO: Could become a sanity check function in the future
        if len(weaviate_menu_item.vector["name_embedding"]) == 0:
            continue
        if len(weaviate_menu_item.vector["description_embedding"]) == 0:
            continue
        if len(weaviate_menu_item.vector["menu_item_embedding"]) == 0:
            continue

        # If everything is okay, stage this for insertion
        menu_item_embeddings.append(weaviate_menu_item)



    if len(menu_item_embeddings) == 0:
        continue

    print(f'Skipped batch {skip_count}')
    skip_count = 0

    try:
        menu_item_embeddings_collection.data.insert_many(menu_item_embeddings)    # This uses batching under the hood
    except Exception as e:
        print(f'Error: {e}')
        print([x.vector['name_embedding'] for x in menu_item_embeddings] )

Additional notes

I was attempting to verify the dimensionality of the vectors in the sequential range around the “corrupt” vector, but I couldn’t find any clues

import json
from utils.menu_utils import MenuUtils
results = menu_item_embeddings_collection.query.fetch_objects(limit=722, include_vector=True)

# Find and print docs at index 714, 715 and 716
for i in range(712, 722):
    print(f'Index: {i}')

    if(i == 716):
        print(f'  Name: {MenuUtils.get_name(json_obj)}')
        print(f'  Vector: {len(name_embedding)}')
        continue
    json_obj = json.loads(results.objects[i].properties['data'])

    name_embedding = results.objects[i].vector['menu_item_embedding']
    print(f'Name: {MenuUtils.get_name(json_obj)}')
    print(f'Vector: {len(name_embedding)}')

This is the only code I found which could seem to cause this error and I couldn’t find any vectors of 0-length in my collection :person_shrugging:

I got this error also, same version of Weaviate.
It happens every time I restart the docker container.
Maybe something to do with shutdown corrupts the index in the docker volume.

Thanks for your addition @nicholasamiller :pray:

Update

I initially thought there was something wrong with how I was building the vectors before inserting them with a None vectorization config wvc.config.Configure.Vectorizer.none()

However, my wvc.config.Configure.NamedVectors.text2vec_openai collection just broke now as well with a similar error.

Collection Setup

Here is the setup for that collection.

client.collections.create(
    name=COLLECTION_NAME,
    description="Collection of menu items with embeddings",
    properties=[
        wvc.config.Property(name="menu_item_id", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="menu_product_id", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="menu_id", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="menu_category_id", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="ref_product_id", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="data", data_type=wvc.config.DataType.TEXT),

    ],
    vectorizer_config=[
                    wvc.config.Configure.NamedVectors.text2vec_openai(
                name="menu_item_embedding", source_properties=["menu_item_text"]
            ),
                    wvc.config.Configure.NamedVectors.text2vec_openai(
                name="name_embedding", source_properties=["name"]
            ),
                    wvc.config.Configure.NamedVectors.text2vec_openai(
                name="description_embedding", source_properties=["description"]
            )
    ],
)

Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask/app.py", line 1488, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask/app.py", line 1466, in wsgi_app
    response = self.handle_exception(e)
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask_cors/extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask/app.py", line 1463, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask/app.py", line 872, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask_cors/extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask/app.py", line 870, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/flask/app.py", line 855, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/server.py", line 76, in search_menu_items
    response = menu_item_embeddings_collection.query.near_text(
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/weaviate/collections/queries/near_text/query.py", line 90, in near_text
    res = self._query.near_text(
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/weaviate/collections/grpc/query.py", line 418, in near_text
    return self.__call(request)
  File "/home/van/Pgammin/Qopla/qMenuAnalysis/.venv/lib/python3.9/site-packages/weaviate/collections/grpc/query.py", line 618, in __call
    raise WeaviateQueryError(e.details(), "GRPC search")  # pyright: ignore
weaviate.exceptions.WeaviateQueryError: Query call with protocol GRPC search failed with message explorer: get class: vector search: object vector search at index openai_menuitemembeddings: shard openai_menuitemembeddings_v7FoIMDvjzTp: vector search: knn search: distance between entrypoint and query node: got a nil or zero-length vector at docID 605.

Additional Notes

  1. The docID is consistent between docker contain restarts
  2. No docker container logs are emitted at the time of triggering the query

Questions

  1. Is there any tooling for me to check the health of my index?
  2. Is it possible to directly inspect a vector based on the docId?
  3. Is there any recommendation you have the filesystem side to create a restore point? (my DB doesn’t change very often)

This has now happened with a 3rd collection (separate docker instance)

Traceback (most recent call last):
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask/app.py", line 1488, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask/app.py", line 1466, in wsgi_app
    response = self.handle_exception(e)
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask_cors/extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask/app.py", line 1463, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask/app.py", line 872, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask_cors/extension.py", line 176, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask/app.py", line 870, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/flask/app.py", line 855, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/van/Pgammin/ReGPT/backend/server.py", line 137, in search_conversations
    response = conversation_collection.query.near_text(
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/weaviate/collections/queries/near_text/query.py", line 90, in near_text
    res = self._query.near_text(
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/weaviate/collections/grpc/query.py", line 418, in near_text
    return self.__call(request)
  File "/home/van/Pgammin/ReGPT/.venv/lib/python3.9/site-packages/weaviate/collections/grpc/query.py", line 618, in __call
    raise WeaviateQueryError(e.details(), "GRPC search")  # pyright: ignore
weaviate.exceptions.WeaviateQueryError: Query call with protocol GRPC search failed with message explorer: get class: vector search: object vector search at index conversations: shard conversations_gMerlVoh5Zup: vector search: knn search: distance between entrypoint and query node: got a nil or zero-length vector at docID 252.

This is after restarting my PC without explicitly shutting down WSL2 or Docker.

Questions

  1. Can I diagnose/heal my index somehow without re-inserting all the data?
  2. Do I need to make periodic backups?

Right now, I’m concerned to put more data into my Weaviate instances. Also, I’m especially hesitate to modify data that will be lost if I need to “reindex” (which also cost when calculating the embeddings)

I tried to reproduce this using the jeopardy sample dataset, adding named vectors. Could not reproduce. I will try to reproduce the problem on a Linux machine: issue could be Docker desktop on Windows.

Hi @puj !

I was able to create the class you provided, I notice that you may want to use deterministic IDs, as you added some logic to avoid inserting the object twice

Also I noticed you are leaving some properties to AUTO SCHEMA to create while on import. My suggestion is to create all properties before hand, specially the ones you are using for named vectors.

But I was not able to reproduce this.

Can you create a minimum reproducible example in a python notebook?

I have same issue.

Here is error i get:
weaviate.exceptions.WeaviateQueryError: Query call with protocol GRPC search failed with message explorer: get class: vector search: object vector search at index subclassconfiguration: shard subclassconfiguration_uOWO1ZhUXD9g: vector search: knn search: distance between entrypoint and query node: got a nil or zero-length vector at docID 1830.

This hapens if docker-compose is restarted via docker-compose down > docker-compose up.

Here is my workflow from start to finish:

First i create db schema:

Than i import data in to database.

Than after schema is created
before i restart both methods work. I can search both using
collection.generate.near_text
and
collection.query.near_text

And it works it returns results. In case of vector search it returns records. In case of generative search it returns message from gpt aditionaly i can fetch references related to found records.

Than i do docker-compose down => docker-compose up -d
And when i try to seacrh using vector i get error like: distance between entrypoint and query node: got a nil or zero-length vector at docID 1830

But i still can search normaly using collection.query.fetch_objects
And there i can include vecctor and it will be included in response. So does exists.

Also i have set up a script to check for null or lenght < 0 script it found no invalid vectors.

But if i try to search i get same error discussed above.

Also i could not find any way in documentation to reindex existing records in db.
Is there no staight forward way to manualy triger reindexing of existing records in db. It seams that the only way to use weaviate is to never stop docker-compose. And if it does stop for some reason. The only way to make it back working again. Is fully recreating it from initial point. So create collections. Than insert data in to them. And than it is posible to search using vectors. That is until server crashes for one reason or another. Than all this process have to be one again. Is it suposed to be this way? What is sugested workflow for shuting down docker-compose and than turning it back on. For instance i can not create collections localy and than transfer that data to server. In this case my vectors get corupte and i get error above. So the only way woudl be to run docker-compose on remote server and than generate vectors there so it works. Unitl server restart.

If anything else need to be provided please let me know.

Problem “solved” when I changed by schema to index on one vector only: no multiple named vectors.
So without really having any idea, I would speculate, wildly, that there is a bug to do with some state, maybe IDs, that are created when adding data objects with multiple named vectors. This state may not be properly serialized to disk. So when the container restarts, it is not loaded into memory, and all the queries fail.
Multiple named vectors are recent feature.
Anyway, I can adjust to make do with a single vector for the moment.
I can’t reproduce without sharing all my data.
My data has 14380 objects at the moment. About 400Mb.
No cross references.
Provided my own vectors.
Problem occurred on both Ubuntu and Windows, anytime the weaviate container restarted.

I will try to reproduce the problem on a Linux machine: issue could be Docker desktop on Windows.

I am facing this when using an embedded instance on MacOS as well.

Edit: version 1.24.11 fix this bug!

Hi there dear friends!!!

Welcome to our community @smwitkowski !! :hugs:

Our team is already aware of this and were able to reproduce. Here is the GH issue:

Thank you all for reporting and being such an amazing community! :people_hugging:

2 Likes

Awesome, thanks for the update. Eagerly awaiting this fix :partying_face:

Thanks everyone for contributing to this report :pray:

1 Like

I’m also getting the same issue. I have a docker-compose launched instance with a mounted volume.

  1. I copied the volume to a remote server and launch another instance there with the volume mounted. Then I got the error.
  2. I want to do some comparison and so I got back to my local instance and restart the container. Then I even got the same error locally.

So I’m also suspecting it related to restarting.

Anyway, glad to see it’s tracked on GH. Is there any workaround before this get fixed?

Linking to the workaround for now

@puj I was able to resolve this issue (PR) Today we will release v1.24.11 that will contain this fix. I will let you know once the release is ready.

2 Likes

@puj I was able to resolve this issue (PR) Today we will release v1.24.11 that will contain this fix. I will let you know once the release is ready.

How lovely! Well done :muscle:
Thanks for following up on this!

Was interesting to browse your solution :male_detective:

Cool :slight_smile: and thank you for your kind words!

I just wanted to let you know that the v1.24.11 is out :slight_smile:

3 Likes

A LIFE SAVER … I was in the same hell. Multiple named vectors and no default modules and nothing worked. Going from 24.6 to 24.11 fixed this.

Thanks to all of you gals and guys !!!

2 Likes