Huge memory consumption for filtering hybrid search queries on AWS EKS deployment (Marketplace subscription)

Description

Weaviate on AWS EKS (Marketplace subscription) consuming large amount of memory (more than 128GB across the two nodes) for hybrid search query with contains_any filter.

Server Setup Information

  • Weaviate Server Version: 1.24
  • Deployment Method: AWS EKS (Through AWS Marketplace)
  • Multi Node? Number of Running Nodes: Yes, 2
  • Node instance type: rdi.4xlarge
  • Client Language and Version: Python v4.10.2
  • Collection size: 3.2M objects
  • Vector dimensions: 1536 (OpenAI ada)
  • Multitenancy: No
  • Helm chart: v17.4.1

Any additional Information

The query that I am running is as follows. The r_ids list contains around 30k entries which are of the format “REGXXXX”.

Please let me know if any additional info is required about the deployment. Thanks in advance!

Memory consumption during the queries:

This version is quite old and we had many improvements for filters in the last releases. Could you check again with 1.28?

Hi @Dirk. Thanks for your response. Based on your suggestion, we have updated to 1.28.4. The memory usage spike seems to be worse than before:

We are running the same query as shared in the previous message. Would you recommend a way to troubleshoot this? Is there a specific log that we can refer to get more info about this abnormal memory usage?

Hey, ok

then let’s do the following:

instead of a hybrid query please do:

  • a bm25 query
  • a near_text query

with the same settings (returns, limits, filters) and check which one has high memory usage. Then please do that query with and without filters.

Things to try out afterwards:

  • if the near_text query is responsible with filters please try How we speed up filtered vector search with ACORN | Weaviate
  • if the bm25 query is the problem, 1.29 (should be released today) contains a major speedup/efficiency improvement. Not involved with the details, but documentation should be available in the next days

Having said that, 30k entries in a list is a lot