[Query .with_near_vector got error: "Query was not successful! Unexpected status code: 504"]

Description

I have a collection of 30M with ‘vectorIndexType’: ‘flat’(the data was imported 4 hrs back),
When query with near vector, it would return error “UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 504, with response body: None.”

    response = (
        weaviate_client.query.get(
            collection_name, 
            ["image_id"]
        )
        .with_near_vector({"vector": query_vector})
        .with_limit(limit)
        .with_additional(["distance"])
        .do()
    )

However for the same query on the collection without near vector, it returns result properly

   response = (
        weaviate_client.query.get(
            collection_name, 
            ["image_id"]
        )
        .with_limit(limit)
        .with_additional(["distance"])
        .do()
    )

Does anyone know what is the reason? Was the error code 504 (gateway timeout) due to it took too long to search the result?

When the same collection reached 10M milestone earlier, this error also happened right after data import completion, but it automatically resolves short after.

Server Setup Information

  • Weaviate Server Version: 1.23.7
  • Deployment Method: K8s
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: 3.21.0

Any additional Information

After ~11 hrs after the import complete, the search with_near_vector works.
Wondering what happened behind the scene? Is the ‘vectorIndexType’: ‘flat’ took excessive time to index?

Hi! I have not yet played with flat index myself yet, but considering it is disk intensive, it probably timed out.

What is happening in logs?

30M may be a lot for flat index :thinking:

From Weaviate Verba:

Based on the provided context, a flat index in the realm of vector databases, such as Weaviate, is essentially a brute force index. It does not use any complex graph structures like Hierarchical Navigable Small World (HNSW) for indexing. Instead, it stores vector embeddings in a simple, straightforward manner, which can be searched through linearly.

In a multi-tenancy context, where each tenant has a relatively small number of vector embeddings (e.g., 10,000 to 100,000), a flat index can be quite efficient. This is because the search space for each tenant is limited to their own set of embeddings, and the brute force search through this small subset can be performed quickly. The flat index is particularly suitable for scenarios where the total number of embeddings is massive, but they are distributed across many tenants, each with a manageable number of embeddings.

The flat index is also designed to be disk-based, which means it can handle large datasets without requiring a large amount of memory. This is beneficial for systems that need to manage large-scale setups with minimal memory usage. The flat index can be combined with binary quantization to further reduce the size of the vector embeddings stored on disk, which can speed up search operations by reducing the amount of data that needs to be read.

In summary, a flat index is a simple, disk-based index that is suitable for use cases with a large number of tenants, each with a small number of vector embeddings. It allows for efficient brute force searches within each tenant’s dataset and can be optimized with techniques like binary quantization for better performance.

Hi @DudaNogueira Thank you for the information.

I use this ‘flat’ index on large collection is also just for benchmark purpose to calculate the recall for another collection with ‘hnsw’ index.

Both collections contains the same data objects, and the one with ‘flat’ index is used to get the ground truth (nearest neighbors).

In that case is there any configuration I can set to increase the timeout threshold? @DudaNogueira

Also, just want to double confirm, the collection with ‘flat’ index can serve the purpose for ground truth to benchmark recall right?
I see a benchmark report, but the ground truth is not explicitly explained.

That is correct.

On your original issue. The flat index was definitely not designed for that scale. It is mainly a tool for many small indexes, such as in a multi-tenancy scenario. That is likely why you would see a long import time. The flat index has a super aggressive compaction strategy, which benefits small indexes (because they typically stay in a single segment). But it comes with a cost when you add a lot of data.

That said, at least in theory – and if you don’t care about the high latency – it should still be possible to use it. We could consider making the compaction strategy configurable. Or add a max segment size above which we wouldn’t try to compact anymore or so.

The 504 error you’re seeing does not come from Weaviate directly, but most likely from a load-balancer in your K8s setup. 504 means Gateway timeout, which would refer to the load balancer or ingress controller (= the gateway) waiting for the weaviate backend to respond is timing out. Most likely this will be a very round number such as 60s. You should see the request block for this time, then fail with 504. If you check the Weaviate logs there will likely be an error message such as broken pipe or context canceled.

To fix this you can:

  • Increase the timeout on the load balancer (or any other proxy layer) of your K8s setup. How to do that depends on your specific setup.
  • Make sure to also set the right Weavaite server timeout. AFAIR, the official helm chart sets those as cli arguments as part of the command section. Something like --write-timeout <number>s. You can adjust that up as well.
1 Like

@etiennedi @DudaNogueira

So I added the args: section in the project values.yaml (everything else remains the same).

weaviate:
args:

  • “–host”
  • “0.0.0.0”
  • “–port”
  • “8080”
  • “–scheme”
  • “http”
  • “–config-file”
  • “/weaviate-config/conf.yaml”
  • –read-timeout=600s
  • –write-timeout=60s
    resources:

after the helm chart change checked in, by checking the updated pod state:
kubectl describe pod weaviate-0 -n weaviate

I can see the arg list is properly overwritten:

Containers:
weaviate:
Container ID: …
Image: …
Image ID: …
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/weaviate
Args:
–host
0.0.0.0
–port
8080
–scheme
http
–config-file
/weaviate-config/conf.yaml
–read-timeout=600s
–write-timeout=60s
State: Running
Started: Mon, 04 Mar 2024 17:47:46 -0800
Ready: True

I also configured the client side read-timeout to 600s:
weaviate_client = weaviate.Client(
url=cfg.weaviate_client_cfg.weaviate_server_url,
auth_client_secret=auth_client_secret,
timeout_config=(30, 600),
)

However, when I run the following query:
start = time.perf_counter()
response = (
weaviate_client.query
.get(“PilotImage_flat”, [“image_id”])
.with_near_vector({
“vector”: v
})
.with_limit(100)
.with_additional([“distance”])
.do()
)

end = time.perf_counter()
print(f"take {end - start} seconds")

interestingly, the latency is on the boundry of 60 seconds (which is the default read-timeout value both on clien side and server side), so if latency < 60s, I can get a proper response, but when latency > 60 s, it throws the following exception:

UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 502, with response body: None.

And from the server side, I found the following error message and hint:

{“buildVersion”:“1.23.7”,“context”:{“kubernetes”:{“container_image_id”:“…”,“container_name”:“weaviate”,“pod_name”:“weaviate-0”,“pod_namespace”:“weaviate”},“log_group”:“weaviate”},“description”:“An I/O timeout occurs when the request takes longer than the specified server-side timeout.”,“error”:“write tcp 10.89.39.134:8080->10.89.7.30:46968: i/o timeout”,“hint”:“Either try increasing the server-side timeout using e.g. ‘–write-timeout=600s’ as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.”,“host”:“ip-10-89-36-81.ec2.internal”,“level”:“ERROR”,“message”:“i/o timeout”,“method”:“POST”,“path”:{“ForceQuery”:false,“Fragment”:“”,“Host”:“”,“OmitHost”:false,“Opaque”:“”,“Path”:“/v1/graphql”,“RawFragment”:“”,“RawPath”:“”,“RawQuery”:“”,“Scheme”:“”,“User”:null},“service”:“weaviate-0”,“source_type”:“kubernetes_logs”,“time”:“2024-03-05T08:11:15Z”}

hint:
Either try increasing the server-side timeout using e.g. ‘–write-timeout=600s’ as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.

since I have configured the read-timeout to 600s on both client side and server side, why I still see this error and hint above?