Optimizing Weaviate for Image Embedding Search without Storing Images

Hey fellow developers,

I’m currently working on a project where I’m using Weaviate as a vector database to store and search for images based on their embeddings. The images themselves are stored in an S3 bucket. My goal is to leverage Weaviate’s capabilities solely for storing and searching image embeddings, while keeping the actual image files in the S3 bucket.

As of now, I’ve successfully configured Weaviate to store both the image embeddings and the images themselves, but I’m interested in optimizing this setup to conserve storage space and streamline the search process. I’ve been through the documentation, but I couldn’t find a way to disable the storage of image files in Weaviate.

Could anyone guide me on how to configure Weaviate to store only the embeddings and utilize it purely as a search engine for images without storing the actual image files? Your insights and suggestions would be greatly appreciated!

Thanks in advance for your help!

Hi @nomomon ! Welcome to our community :hugs:

I don’t think that one can use Weaviate vectorization modules and do not store the image with it as of now.

However, you can supply the vectors yourself. So when a new image comes in, you vectorize it yourself, ingest only the image metadata (filename, geolocation, etc.), it’s vectors and the S3 reference.

Notice that, whenever doing a query, you will need to also vectorize the image in order to query it against your collection.

Here is a tutorial on how to work with custom vectors, that might help you:

Let me know if that helps.

Thanks!

Hi @DudaNogueira , thank you for your advice! I’ve successfully implemented it while using the native weaviate img2vec module myself.

I’ve written a more descriptive explanation under this stackoverflow question:

1 Like