Help me fix this 500ms latency for vector search!

Zen_Tang · March 22, 2025, 11:21pm

Query Latency

Hi folks!

I’m using Weaviate on the cloud, and I’m getting query latencies like this… does anyone have anything they suggest to do?

I mostly just stuck with the “bare minimum” (Ie, tutorial level) to see what would happen!

My collection is just a bunch of text jfk_files/jfk_text at main · amasad/jfk_files · GitHub
I generated summaries of each of the text items (~1000 tokens for each doc)
I just used default embeddings (openai 1536-dimensional one)
There’s only ~1000 documents (~2-3k if i break them up into chunks)

Debugging details

Cluster size & region

Sandbox cluster (I tried US East and US West), and it didn’t really make a difference
I upgraded to “Serverless” but that didn’t seem to improve it either

Things I tried

Switch to “Flat” indexing (vector_index_config=Configure.VectorIndex.flat(),) – the latency is about the same though

Collection Config

Collection found: <weaviate.Collection config={
  "name": "DocSummaries7_hnsw",
  "description": null,
  "generative_config": null,
  "inverted_index_config": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanup_interval_seconds": 60,
    "index_null_state": false,
    "index_property_length": false,
    "index_timestamps": false,
    "stopwords": {
      "preset": "en",
      "additions": null,
      "removals": null
    }
  },
  "multi_tenancy_config": {
    "enabled": false,
    "auto_tenant_creation": false,
    "auto_tenant_activation": false
  },
  "properties": [
    {
      "name": "title",
      "description": null,
      "data_type": "text",
      "index_filterable": true,
      "index_range_filters": false,
      "index_searchable": true,
      "nested_properties": null,
      "tokenization": "word",
      "vectorizer_config": {
        "skip": false,
        "vectorize_property_name": true
      },
      "vectorizer": "text2vec-openai",
      "vectorizer_configs": null
    },
    {
      "name": "content",
      "description": null,
      "data_type": "text",
      "index_filterable": true,
      "index_range_filters": false,
      "index_searchable": true,
      "nested_properties": null,
      "tokenization": "word",
      "vectorizer_config": {
        "skip": false,
        "vectorize_property_name": true
      },
      "vectorizer": "text2vec-openai",
      "vectorizer_configs": null
    }
  ],
  "references": [],
  "replication_config": {
    "factor": 1,
    "async_enabled": false,
    "deletion_strategy": "NoAutomatedResolution"
  },
  "reranker_config": null,
  "sharding_config": {
    "virtual_per_physical": 128,
    "desired_count": 1,
    "actual_count": 1,
    "desired_virtual_count": 128,
    "actual_virtual_count": 128,
    "key": "_id",
    "strategy": "hash",
    "function": "murmur3"
  },
  "vector_index_config": {
    "multi_vector": null,
    "quantizer": null,
    "cleanup_interval_seconds": 300,
    "distance_metric": "cosine",
    "dynamic_ef_min": 100,
    "dynamic_ef_max": 500,
    "dynamic_ef_factor": 8,
    "ef": -1,
    "ef_construction": 128,
    "filter_strategy": "sweeping",
    "flat_search_cutoff": 40000,
    "max_connections": 32,
    "skip": false,
    "vector_cache_max_objects": 1000000000000
  },
  "vector_index_type": "hnsw",
  "vectorizer_config": {
    "vectorizer": "text2vec-openai",
    "model": {
      "baseURL": "https://api.openai.com",
      "isAzure": false,
      "model": "text-embedding-3-small"
    },
    "vectorize_collection_name": true
  },
  "vectorizer": "text2vec-openai",
  "vector_config": null
}>

I tried to keep it as simple as possible:

self.client.collections.create(
      name,
      vectorizer_config=Configure.Vectorizer.text2vec_openai(),
      # vector_index_config=Configure.VectorIndex.flat(),
      properties=[  # properties configuration is optional
      Property(name="title", data_type=DataType.TEXT),
      Property(name="content", data_type=DataType.TEXT),
     ],
)

Other tags:
slow, semantic search, hybrid search

Joe · March 24, 2025, 4:08pm

Hey Zen,

Happy to help here! There are definitely some unknowns I’d like to clarify:

Could you provide a snippet of the query being used? (I imagine if it’s just tutorial level stuff we shouldn’t see spikes in latency like this)
How are you hosting your function, is it a cloud function or are you running it locally? ( If it’s a cloud function what region is it hosted in, and if it’s locally can you confirm where you are connecting from?)

You also mentioned that you tried this on both sandboxes and a Serverless cluster, I’d love to doublecheck that cluster but I don’t want to share any sensitive info openly on our forum. Would you be able to create a ticket with our support by emailing these details to Support@weaviate.io? We can then continue this conversation in that ticket, and any public solution I can post back here if needed!

Regards,

Joe

Zen_Tang · March 24, 2025, 5:21pm

Hi Joe!

I’m just running it locally on my computer.

My query code looks like this!

@timer_decorator
def semantic_search(self, query, limit=10):
    response = self.collection.query.near_text(
        query=query,
        limit=limit,
        # return_metadata=MetadataQuery(distance=True, score=True),
    )
    return response

Zen_Tang · March 24, 2025, 5:25pm

Hi Joe!

Thanks for helping! Here’s the details!

Endpoint: https://9p6vscwpqlgxuawurcupaq.c0.us-east1.gcp.weaviate.cloud
Collection name: DocSummaries7_flat/DocSummaries7_hnsw

Topic		Replies	Views
Assistance Needed to Improve Weaviate's Vector Search Performance General	2	643	March 6, 2025
Why my Weaviate vector search performance is low? General	3	1157	March 4, 2024
Best practice for fast embedding with OpenAI? ( or similar performance ) General technical	1	237	June 12, 2025
Distance between entrypoint and query node Support	2	318	December 20, 2024
Slow text search Support	3	261	June 10, 2024

Help me fix this 500ms latency for vector search!

Query Latency

Debugging details

Related topics