Resource Usage

Hi Team , I have a schema with the configuration
class_schema = {
“class”:“class_”, # class name
“vectorizer”: “none”,
“properties”: [{
“name”: “record_id”,
“dataType”: [“int”]
}],
“vectorIndexType”: “hnsw”,
“vectorIndexConfig”: {
“skip”: False,
“ef”: 256,
“efConstruction”: 256,
“maxConnections”: 64,
“vectorCacheMaxObjects”:300000,
“distance”: “l2-squared”
},
“shardingConfig”: {
“virtualPerPhysical”: 128,
“desiredCount”: 2,
“actualCount”: 2,
“key”: “_id”,
“strategy”: “hash”,
“function”: “murmur3”
},
“replicationConfig”:{
“factor”:1
}
}

I currently have 50K objects with 128 dimension vector. I recognize that memory consumption is more than double the disk space occupied.
disk = 53.2M,memory =110M.
queries:

  1. is that memory consumption always greater than disk occupied for the above vectorIndexConfig?
  2. does any thing have to done to reduce memory usage without comprising search accuracy?
  3. What if replica to 2, does memory consumption also doubled ?

Hi! Not sure I can answer all questions.

But if you set replica to 2, it will store your objects twice. So if you have only one node, it should double.

Regarding 2, there are some research on using DiskAnn. Of course, PQ and BQ are other options, but it will looose some accuracy. I believe you can take a step back, and consider the ammount of dimensions you are using. Maybe a lower dimension can get you the results you want, while consuming less memory.

For 1, it will read the data from disk into memory, as well as the connections. usually, right after start, Weaviate will load all those and use a lot of memory. So after a while, garbage collection should kick in and do some cleaning.

Let me know if that helps :slight_smile: