I need to move to a higher availability cluster. The instructions I received from support:
In order to take full advantage of multiple nodes, your class must be configured to have multiple shards or replicate the data in multiple nodes.
So, simply adding a new node will not be enough to have a multiple node cluster. This will be possible in the future with dynamic scaling feature.
With that said, the best solution is to create a new cluster (marking it as HA) and move your data there.
For that, you can create your class in your new cluster, specifying the sharding and replication config, and move your data over using this migration guide:
Not sure about most of this, but in looking at the migration documentation, I know for sure I need instructions on how to do this using API commands (i.e. cURL) as I do not use python at all.
So, what I get so far is that I need to create a new cluster, and then a new class object within that cluster (ideally, a duplicate of my existing class). But from there, I need some instructions on how to do the migration itself.
I also would like to take advantage of the new OpenAI embedding models.
Note that the python v4 client will use GRPC connection, making it way performant than http/rest. So if you have a lot of data, using the python client comes handy.
Using python client or curl, you are basically moving data around.
As you want to use the new OpenAi embedding models, you will create the class first, with the proper configuration, and then copy the data from the old cluster.
However, you will not provide the vector when inserting those objects. This will trigger Weaviate to vectorize your data, now using the new model.
If you have this data elsewhere, for example, a pipeline that extracts and load into Weaviate, you could use it to reindex your data on your new cluster, with the new OpenAi model.
Let me know if this clairfies it for you
Also, feel free to ping me in our Slack if you need more clarification or sharing more details.
I do not want to copy the data from the old cluster as it requires using python, and I don’t want to be forced to use python.
What I am saying is that I want to create a new cluster, create a new object, then create a NEW embedding with my source content. It seems easier to me to simply proceed as if I were starting from the beginning.