Hi!
I am working on a recommendation system using LLM embeddings and I’m looking for the right database for my use-case.
I have put together a set of requirements with what I investigated on how I can fulfill them using this database, and thought of coming here to check if someone with more experience with it can help me to know if this makes sense, if I’m overlooking something, etc.
I don’t see having to support more than 500 records and maybe 100 requests per day in the mid-term, so I don’t need something with great optimizations or scaling options, but of course the cheaper the better.
So far, these are my requirements and what I have found in the docs:
-
I must be able to store n>=1 vector embeddings per ID OR I must be able to store 1 very large vector embedding per ID: YES
-
I must be able to store and retrieve metadata: YES
-
I must be able to do pre-filtering based on metadata: YES
-
I must be able to do database migrations (i.e. add/remove columns to each table): I found that I can add properties but I can’t delete them. Some open questions to investigate would be: would this be a problem if I want to iterate with different features before committing to one schema? would this commitment hurt my development speed in the future?
-
(Highly desirable) I want a good ts (or js) client: NO. Unfortunately, Weaviate focuses its development on the python library and the js/ts has lagged in some features.
-
(Desirable) I want to do pagination after pre-filtered queries OR (Required) I must be able to retrieve every result: YES. Weaviate offers pagination with limit and offset, but as I don’t expect to have that many records I am thinking of just storing the rank of every result in a separate collection (maybe in a different DB? This is a drawback) and querying that directly.
As a plus and to be fair, Weaviate already has several search algorithms implemented that would make my life easier, although I don’t want to be constrained to that and I find Weviate’s pricing very competitive in comparison to mongodb, even though I agree with all the benefits of using Mongo listed in their website, specially having all the capabilities of a full DB in the same place where my vector embeddings are managed.