Help me fix this 500ms latency for vector search!

Query Latency

Hi folks!

I’m using Weaviate on the cloud, and I’m getting query latencies like this… does anyone have anything they suggest to do?

I mostly just stuck with the “bare minimum” (Ie, tutorial level) to see what would happen!

  • My collection is just a bunch of text jfk_files/jfk_text at main · amasad/jfk_files · GitHub
  • I generated summaries of each of the text items (~1000 tokens for each doc)
  • I just used default embeddings (openai 1536-dimensional one)
  • There’s only ~1000 documents (~2-3k if i break them up into chunks)

Debugging details

Cluster size & region

  • Sandbox cluster (I tried US East and US West), and it didn’t really make a difference
  • I upgraded to “Serverless” but that didn’t seem to improve it either

Things I tried

  • Switch to “Flat” indexing (vector_index_config=Configure.VectorIndex.flat(),) – the latency is about the same though

Collection Config

Collection found: <weaviate.Collection config={
  "name": "DocSummaries7_hnsw",
  "description": null,
  "generative_config": null,
  "inverted_index_config": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanup_interval_seconds": 60,
    "index_null_state": false,
    "index_property_length": false,
    "index_timestamps": false,
    "stopwords": {
      "preset": "en",
      "additions": null,
      "removals": null
    }
  },
  "multi_tenancy_config": {
    "enabled": false,
    "auto_tenant_creation": false,
    "auto_tenant_activation": false
  },
  "properties": [
    {
      "name": "title",
      "description": null,
      "data_type": "text",
      "index_filterable": true,
      "index_range_filters": false,
      "index_searchable": true,
      "nested_properties": null,
      "tokenization": "word",
      "vectorizer_config": {
        "skip": false,
        "vectorize_property_name": true
      },
      "vectorizer": "text2vec-openai",
      "vectorizer_configs": null
    },
    {
      "name": "content",
      "description": null,
      "data_type": "text",
      "index_filterable": true,
      "index_range_filters": false,
      "index_searchable": true,
      "nested_properties": null,
      "tokenization": "word",
      "vectorizer_config": {
        "skip": false,
        "vectorize_property_name": true
      },
      "vectorizer": "text2vec-openai",
      "vectorizer_configs": null
    }
  ],
  "references": [],
  "replication_config": {
    "factor": 1,
    "async_enabled": false,
    "deletion_strategy": "NoAutomatedResolution"
  },
  "reranker_config": null,
  "sharding_config": {
    "virtual_per_physical": 128,
    "desired_count": 1,
    "actual_count": 1,
    "desired_virtual_count": 128,
    "actual_virtual_count": 128,
    "key": "_id",
    "strategy": "hash",
    "function": "murmur3"
  },
  "vector_index_config": {
    "multi_vector": null,
    "quantizer": null,
    "cleanup_interval_seconds": 300,
    "distance_metric": "cosine",
    "dynamic_ef_min": 100,
    "dynamic_ef_max": 500,
    "dynamic_ef_factor": 8,
    "ef": -1,
    "ef_construction": 128,
    "filter_strategy": "sweeping",
    "flat_search_cutoff": 40000,
    "max_connections": 32,
    "skip": false,
    "vector_cache_max_objects": 1000000000000
  },
  "vector_index_type": "hnsw",
  "vectorizer_config": {
    "vectorizer": "text2vec-openai",
    "model": {
      "baseURL": "https://api.openai.com",
      "isAzure": false,
      "model": "text-embedding-3-small"
    },
    "vectorize_collection_name": true
  },
  "vectorizer": "text2vec-openai",
  "vector_config": null
}>

I tried to keep it as simple as possible:

self.client.collections.create(
      name,
      vectorizer_config=Configure.Vectorizer.text2vec_openai(),
      # vector_index_config=Configure.VectorIndex.flat(),
      properties=[  # properties configuration is optional
      Property(name="title", data_type=DataType.TEXT),
      Property(name="content", data_type=DataType.TEXT),
     ],
)

Other tags:
slow, semantic search, hybrid search

Hey Zen,

Happy to help here! There are definitely some unknowns I’d like to clarify:

  1. Could you provide a snippet of the query being used? (I imagine if it’s just tutorial level stuff we shouldn’t see spikes in latency like this)

  2. How are you hosting your function, is it a cloud function or are you running it locally? ( If it’s a cloud function what region is it hosted in, and if it’s locally can you confirm where you are connecting from?)

You also mentioned that you tried this on both sandboxes and a Serverless cluster, I’d love to doublecheck that cluster but I don’t want to share any sensitive info openly on our forum. Would you be able to create a ticket with our support by emailing these details to Support@weaviate.io? We can then continue this conversation in that ticket, and any public solution I can post back here if needed!

Regards,

Joe

Hi Joe!

I’m just running it locally on my computer.

My query code looks like this!

@timer_decorator
def semantic_search(self, query, limit=10):
    response = self.collection.query.near_text(
        query=query,
        limit=limit,
        # return_metadata=MetadataQuery(distance=True, score=True),
    )
    return response

Hi Joe!

Thanks for helping! Here’s the details!

Endpoint: https://9p6vscwpqlgxuawurcupaq.c0.us-east1.gcp.weaviate.cloud
Collection name: DocSummaries7_flat/DocSummaries7_hnsw