Storing original documents (pdfs) as properties in WCS

Hey everyone,

WCS pricing solely depends on vector dimensions and stored data objects.

In our workflow, we take some pdfs, transform them into text by a third party service, chunk the results and finally get vector embeddings and store these in Weaviate.

Now I was wondering, whether it would be a good idea to store the original pdf documents along with the extracted text and vectors in Weaviate as a property with datatype blob. This would help us restore the original file later and we would have everything in one place, i.e. in WCS.

As I understand it, large property entries would not interfere with search performance. Is this correct?

Would it be considered good practice to store the original documents in such a way? Or would it violate some sort of “fair use” of the object properties in WCS as they do not cost extra?

Thanks in advance and best regards,
Steffen

hi @AccessPointAI

Storing blobs in Weaviate (or any database, for that matter) could potentially impact performance as the database will have to handle those files in (on ingestion) and out (on retrieval).

So it is not considered a best practice. Best practice would be to only store it’s reference and retrieve it from somewhere in the cloud, for example.

There is no limitation regarding storing those blobs in your WCS cluster, if you decide to do so.

Let me know if this helps!

Thanks!

Hi @DudaNogueira,

thanks, I understand. Of course, there can be an impact on performance during file in-/output, but not as long as it is in rest. But I agree that there may be better options for file storage.

Best regards!

1 Like