In my local Weaviate instance, I have about, 4500 objects. For each object, I generate the embeddings myself, as 512-dimensional floating point vectors. I have made sure to set vectorizer: none
on the objects. So let’s say the embeddings are float 64.
Each embedding is 512 units long, so 5000 embeddings should take up maybe like 20 megabytes, but disk usage (du -h
) shows
1.1M .weaviate_data/image_3pnsXZxaxx5G.hnsw.commitlog.d
1.9M .weaviate_data/image_3pnsXZxaxx5G_lsm/property_name
35M .weaviate_data/image_3pnsXZxaxx5G_lsm/objects
526M .weaviate_data/image_3pnsXZxaxx5G_lsm/property_embedding
1.4M .weaviate_data/image_3pnsXZxaxx5G_lsm/property_path_searchable
556K .weaviate_data/image_3pnsXZxaxx5G_lsm/property__id
2.2M .weaviate_data/image_3pnsXZxaxx5G_lsm/property_path
1.1M .weaviate_data/image_3pnsXZxaxx5G_lsm/property_name_searchable
567M .weaviate_data/image_3pnsXZxaxx5G_lsm
568M .weaviate_data/
I’d just like to know where the additional memory usage is coming from? The only idea I have is that Weaviate is allocating more memory than necessary to store the vector. Thanks in advanced.
Hi @frischoko . That’s a good question. I think it’s due to the way our data store works (it uses a LSM-tree).
But I’ve asked around internally to confirm. I’ll get back to you when I hear back.
Can you share with us also how are you sending the requests to create objects?
Are you there using embeddings
property to send vector or vector
?
I’m creating objects like this
object := models.Object{
Class: "Image",
ID: strfmt.UUID(uuid),
Properties: map[string]interface{}{
"name": image.Name,
"path": image.Path,
"embedding": vector,
},
Vector: vector,
}
and inserting them with the batcher
_, err = client.Batch().
ObjectsBatcher().
WithObjects(objects...). // Slice of object
Do(context.Background())
@frischoko the reason that you see that so much space is being taken by Weaviate is bc you are also storing the vector as a property, for which Weaviate builds an inverted index.
Weaviate builds inverted index for properties to enable users to perform keyword search, in your case you should not push a vector as a property (unless you want to perform keyword search on embedding
field, which I think you don’t) you should only store your vector using Vector
field.
object := models.Object{
Class: "Image",
ID: strfmt.UUID(uuid),
Properties: map[string]interface{}{
"name": image.Name,
"path": image.Path,
// "embedding": vector, <- goes into inverted index for keyword search
},
Vector: vector, // <- goes into vector index for vector search
}
3 Likes