I’m using Python. I have an API that returns huge amount of data as JSON objects
Can Weaviate read that data? Do I need to process it before feeding it into Weaviate?
If the data changes, either the structure of the API response changes or the existing values got updated, is it possible to update the data in Real time, on the fly, in production? Like you update a MySQL database for example
Is there a reliable vectorizer/embeddings I can use other than OpenAI? I plan to self host the entire project and not rely on any third party API.
Yes I know that this question was asked before but here I am asking it again, because the users were told to go checkout a python notebook on github that no longer exists.
I’m not going to search git history, because even if I find the notebook, maybe the code is obsolete. The fact that it was removed from Github could mean that the ability to read JSON data is deprecated in weaviate, who knows.
No, Weaviate will not read and store entire json data. You will need to parse that json, and store it accordingly. Note that while we do have the object datatype, it does have some limitations.
If you have a deterministic id for that object, you can always pass it as the object uuid during a batch operation. That way, Weaviate will insert or create that object for you. If the object has a new property that, Weaviate will create that property “on the fly”, if the AUTO_SCHEMA feature is active.
That is partially correct.
Underneath, Weaviate stores data in JSON format, and it stores the whole object.
The limitation is mostly with nested properties (i.e. an object that contains “address”, which is made of “street” and “city”) - nested properties don’t get vectorized, however nested data can still be stored in Weaviate.
What I meant before was that you cannot pass the .json file into Weaviate. You will need to read that file and work if it’s content. For example, if using python, it will be a dict. being aware of the limitations of the object datatype as mentioned above.
Here you can find a list of supported datatypes:
So for example, if you have a nested content in your json, and you need it to be vectorized and searchable, you will need to bring that nested content to the root structure, meaning it will now become a property by itself.