What is the process for changing vectorizer model

Description

What is the process for changing the vectorizer model in my schema?

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

I’d like to change it to gpt-4o-mini.

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method: WCS
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:

Any additional Information

I do everything using REST API.

hi @SomebodySysop !!

Changing the vectorization model of a collection would trigger a re-vectorization of the entire collection, which Weaviate does not support.

This is not possible as of now, as well as adding a named vector, as we have not implemented async vectorization.

The only way for now is to create a new collection with the new vectorizer configuration, and then copy the data from the old collection to the new one.

OK. Do you know if gpt-4o or gpt-4o-mini are supported in generative-openai?

Update: Never mind. Yes, they are: Generative AI | Weaviate

1 Like

So, I know how to create a new collection in the cluster.

How do I copy the data from the old collection to the new one using REST API? That is not discussed in the documentation: Migrate data | Weaviate

Note that on 1.27+ you can now change the generative configuration without the need to recreate the collection.

The best approach for the migration is the one described here:

Because this will leverage GRPC for reading and inserting the data.

You can get all objects if using the “cursor api”. It will give you the after parameter that allows you to bring the next objects:

Let me know if this helps!

Thanks!

I am confused by this, because your next message says:

But I still have to migrate the data.

So, I’m utterly confused.

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

I want to change the model to gpt-4o-mini

What I thought I understood was that I needed to create a new schema, which would create a new collection, and then migrate the data from the old collection to the new.

So, in my case, I would create a new schema SolrCopy02 which would then create the SolrCopy02 collection. Then I would migrate the SolrCopy01 collection data to the SolrCopy02 collection.

What you wrote above doesn’t sound like that process. Or, I’m just not understanding the terminology. But, as gpt-3.5-turbo is a legacy model, I probably need to update soon.

OK, let me clarify.

When you configure the vectorizer, it means that your data is going to be vectorized with that model.

So for example, let’s say you select a vectorizer that embed vectors with 300 dimensions.

Now you want to change to a different model with, let’s say, 1536 dimensions.

You will need to vectorize all your content again, because the vectors came from different models. Even if they had the same dimensionality, they came from different models.

So in order to change the vectorizer, you need to both define the model accordingly and to vectorize all your content again.

So the vectorizer configuration of a collection is not mutable.

Since Weaviate 1.27 version, the generative configuration is now mutable. This means that if you configured your collection to use, for example, cohere as the generative, you can change it to open ai, for example.

gpt-3.5, gpt-4 and so on, is a generative configuration.

Let me know if this helps!

Thanks!

I really appreciate the explanation, but it’s a bird’s eye view of the process whereas I need the steps. I want to change the schema from:

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

to

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-4o-mini”,
],

From the explanation above, it sounds like all I need to do is update the schema and I’m done. But, you also state that since I’m changing the vectorizer, I need to re-vectorize the content – which makes sense. And if that’s the case, I’m really talking about migrating content from SolrCopy01 to SolrCopy02, as opposed to just modifying the schema of SolrCopy01.

So, I still do not know how to proceed.

From what you wrote, you do not actually want to change the vectorizer, in both your schemas you state it as text2vec-openai. So the embedding model will stay the same and with that, you can keep your collection with the precalculated embeddings of that type.

The generative part is different from the vectorizer, it is used in RAG to come up with a response suitable returning to a user query at runtime (while embeddings are calculated during data ingestion). So it makes sense that Weaviate does not have a problem with the user changing that setting lateron. Hope that clarifies things - if not, maybe check out the documentation here to read more on the topic: Generative AI | Weaviate

This seems to indicate that the opposite is true: Slack

However, if I can update the schema without updating the collection, that sounds great! So, using just curl, how do I get from

Can I do from the query screen in WCS? If not, how can I do it using REST API (curl)?

It would also be nice if I could edit the collection in WCS, as this screen seems to indicate, but it doesn’t allow me to change anything:

Hi!

You can change the generative model as stated here:

It doesn’t have a curl example. For that, you need to use this endpoint:

First, get the collection json, change it, and PUT the payload, just like it is need for activating the ACORN

Let me know if this help!

Thanks!

OK, I think I’m getting it through my thick skull. Since I’m using curl, I just have three steps:

  1. Get the existing schema:

curl --request GET
-H “Content-Type: application/json”
–url http://localhost:8080/v1/schema/Test

  1. Make whatever change I want.

  2. Put it back:

curl
–request PUT
-H “Content-Type: application/json”
–url http://localhost:8080/v1/schema/Test
–data '{
“class”:“Test”, etc…

Right?

Final, last question:

If all I am changing in the schema is the model of the generative-openai:

Do I need to rebuild / migrate the collection?

Thanks for hanging in there with me. Once I get it, I’ve got it. But it sometimes takes me a minute.

1 Like

No :slight_smile: This can be changed without rebuilding since 1.27.X (don’t remember the exact version)

Hi @SomebodySysop !

No need to reindex / rebuild / migrate the collection for changing the generative configuration.

here we have a list of collection mutability:

All configurations in that list can be changed using the client or calling the REST endpoint as you mentioned.

Everything else will require a new collection with that setting already changed, copy the data over, and with that, a reindex will be triggered.

Hope this helps!

Thanks!