I want to iterate over all the data in a Weaviate (text) collection and also get the raw data that was used to generate the embeddings. That leads to a couple of questions:
Does Weaviate store the raw data by default when you ask it to embed some data? Or is there some way to make sure the raw text data gets stored as well?
Does the “after” method as described here (Read all objects | Weaviate - vector database) give me access to that raw data?
Hi, @pramodbiligiri! Welcome to our community!
Weaviate will store both vectors and raw data.
And by default it will give you the raw data as the name property.
If you want the vectors, you can also retrieve it using additional properties: GraphQL - Additional properties | Weaviate - vector database
The Read All Objects you linked is more suited for backups.
You will need to Query your data. Here is a nice doc on how to do it:
Let me know if that happens!
Ah, sorry! You do want to iterate over all of them, no querying it
I am pretty sure you can get both vectors and raw data with that.
Let me know if are not able.
Thanks for the response. I’m planning to store the raw data using this generic attributes feature - Data structure | Weaviate - vector database - and then retrieve it back the same way as i iterate over the records. Do you foresee any major performance issues with this approach. For example, does Weaviate have limitations on the size of this data or in-memory limitations?
AFAIK there should not be any limitations.
Depending on your scale, you will have to run multiple nodes.