What is the process for changing vectorizer model

Description

What is the process for changing the vectorizer model in my schema?

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

I’d like to change it to gpt-4o-mini.

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method: WCS
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:

Any additional Information

I do everything using REST API.

hi @SomebodySysop !!

Changing the vectorization model of a collection would trigger a re-vectorization of the entire collection, which Weaviate does not support.

This is not possible as of now, as well as adding a named vector, as we have not implemented async vectorization.

The only way for now is to create a new collection with the new vectorizer configuration, and then copy the data from the old collection to the new one.

OK. Do you know if gpt-4o or gpt-4o-mini are supported in generative-openai?

Update: Never mind. Yes, they are: Generative AI | Weaviate

1 Like

So, I know how to create a new collection in the cluster.

How do I copy the data from the old collection to the new one using REST API? That is not discussed in the documentation: Migrate data | Weaviate

Note that on 1.27+ you can now change the generative configuration without the need to recreate the collection.

The best approach for the migration is the one described here:

Because this will leverage GRPC for reading and inserting the data.

You can get all objects if using the “cursor api”. It will give you the after parameter that allows you to bring the next objects:

Let me know if this helps!

Thanks!

I am confused by this, because your next message says:

But I still have to migrate the data.

So, I’m utterly confused.

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

I want to change the model to gpt-4o-mini

What I thought I understood was that I needed to create a new schema, which would create a new collection, and then migrate the data from the old collection to the new.

So, in my case, I would create a new schema SolrCopy02 which would then create the SolrCopy02 collection. Then I would migrate the SolrCopy01 collection data to the SolrCopy02 collection.

What you wrote above doesn’t sound like that process. Or, I’m just not understanding the terminology. But, as gpt-3.5-turbo is a legacy model, I probably need to update soon.

OK, let me clarify.

When you configure the vectorizer, it means that your data is going to be vectorized with that model.

So for example, let’s say you select a vectorizer that embed vectors with 300 dimensions.

Now you want to change to a different model with, let’s say, 1536 dimensions.

You will need to vectorize all your content again, because the vectors came from different models. Even if they had the same dimensionality, they came from different models.

So in order to change the vectorizer, you need to both define the model accordingly and to vectorize all your content again.

So the vectorizer configuration of a collection is not mutable.

Since Weaviate 1.27 version, the generative configuration is now mutable. This means that if you configured your collection to use, for example, cohere as the generative, you can change it to open ai, for example.

gpt-3.5, gpt-4 and so on, is a generative configuration.

Let me know if this helps!

Thanks!

I really appreciate the explanation, but it’s a bird’s eye view of the process whereas I need the steps. I want to change the schema from:

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

to

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-4o-mini”,
],

From the explanation above, it sounds like all I need to do is update the schema and I’m done. But, you also state that since I’m changing the vectorizer, I need to re-vectorize the content – which makes sense. And if that’s the case, I’m really talking about migrating content from SolrCopy01 to SolrCopy02, as opposed to just modifying the schema of SolrCopy01.

So, I still do not know how to proceed.

From what you wrote, you do not actually want to change the vectorizer, in both your schemas you state it as text2vec-openai. So the embedding model will stay the same and with that, you can keep your collection with the precalculated embeddings of that type.

The generative part is different from the vectorizer, it is used in RAG to come up with a response suitable returning to a user query at runtime (while embeddings are calculated during data ingestion). So it makes sense that Weaviate does not have a problem with the user changing that setting lateron. Hope that clarifies things - if not, maybe check out the documentation here to read more on the topic: Generative AI | Weaviate

This seems to indicate that the opposite is true: Slack

However, if I can update the schema without updating the collection, that sounds great! So, using just curl, how do I get from

Can I do from the query screen in WCS? If not, how can I do it using REST API (curl)?

It would also be nice if I could edit the collection in WCS, as this screen seems to indicate, but it doesn’t allow me to change anything:

Hi!

You can change the generative model as stated here:

It doesn’t have a curl example. For that, you need to use this endpoint:

First, get the collection json, change it, and PUT the payload, just like it is need for activating the ACORN

Let me know if this help!

Thanks!

OK, I think I’m getting it through my thick skull. Since I’m using curl, I just have three steps:

  1. Get the existing schema:

curl --request GET
-H “Content-Type: application/json”
–url http://localhost:8080/v1/schema/Test

  1. Make whatever change I want.

  2. Put it back:

curl
–request PUT
-H “Content-Type: application/json”
–url http://localhost:8080/v1/schema/Test
–data '{
“class”:“Test”, etc…

Right?

Final, last question:

If all I am changing in the schema is the model of the generative-openai:

Do I need to rebuild / migrate the collection?

Thanks for hanging in there with me. Once I get it, I’ve got it. But it sometimes takes me a minute.

1 Like

No :slight_smile: This can be changed without rebuilding since 1.27.X (don’t remember the exact version)

1 Like

Hi @SomebodySysop !

No need to reindex / rebuild / migrate the collection for changing the generative configuration.

here we have a list of collection mutability:

All configurations in that list can be changed using the client or calling the REST endpoint as you mentioned.

Everything else will require a new collection with that setting already changed, copy the data over, and with that, a reindex will be triggered.

Hope this helps!

Thanks!

1 Like

I followed the steps to change the generative-openai model, and it worked.

HOWEVER, I am now getting this error on all hybrid searches:
object vector search at index solrcopy01: shard solrcopy01_rsFKdoPa7JaL: vector search: 3072 vs 1536: vector lengths don’t match

Need Help!

Message Query (solrai_getContext) : { Get { SolrCopy01 ( limit: 10 hybrid: { query: “Rules for reuse of photography in television agreements between episodes. Be sure to always reference any relevant memorandums of agreement and MOAs in your response.” alpha: 0.8 } where: { operator: And, operands: [ { path: [“site”], operator: Equal, valueText:“https://labor.booksai.org/”},{ operator: Or, operands: [ { path: [“groups”], operator: Equal, valueText: “SAG-AFTRA” } ] },{ operator: Or, operands: [ { path: [“taxonomy”], operator: Equal, valueText: “Current” }, { path: [“taxonomy”], operator: Equal, valueText: “Archived” } ] } ] } ){ _additional { distance score } docId site title nid type public url content taxonomy groups date summary questions sourceUrl solrId } } }

Message Response (solrai_getContext) : Array ( [data] => Array ( [Get] => Array ( [SolrCopy01] => ) ) [errors] => Array ( [0] => Array ( [locations] => Array ( [0] => Array ( [column] => 4 [line] => 4 ) ) [message] => explorer: get class: vector search: object vector search at index solrcopy01: shard solrcopy01_rsFKdoPa7JaL: vector search: 3072 vs 1536: vector lengths don’t match [path] => Array ( [0] => Get [1] => SolrCopy01 ) ) ) [_elapsed_time] => 0.73112106323242 )

Hi @SomebodySysop, If you are not on latest 1.28.3, you may encountered this bug Mismatch in Vector Dimensions During Hybrid Search vs Vector Search · Issue #6873 · weaviate/weaviate · GitHub

Could you please upgrade and retry? If this is not your case, please open a new topic with all information so we can investigate.

1 Like

These queries are on the scbbs01 cluster, which I upgraded last night to 1.28.3.

The first hybrid query I executed went through. But the second returned this error, which I’ve never seen before:

Question: where are the special cases discussed, in which document(s)?
concept: Core concept: “Location of special cases in specific documents.” Be sure to always reference any relevant memorandums of agreement and MOAs in your response.

Array
(
[errors] => Array
(
[0] => Array
(
[locations] => Array
(
[0] => Array
(
[column] => 37
[line] => 7
)

                    )

                [message] => Syntax Error GraphQL request (7:37) Expected :, found Name "of"

6: \u0009\u0009\u0009 hybrid: {
7: \u0009\u0009\u0009\u0009query: “Core concept: “Location of special cases in specific documents.” Be sure to always reference any relevant memorandums of agreement and MOAs in your response.”
^
8: \u0009\u0009\u0009\u0009alpha: 0.7

                [path] => 
            )

    )

[_elapsed_time] => 0.10042881965637

)

This is the exact query sent:

Query (solrai_getContext) : { Get { SolrCopy01 ( limit: 30 hybrid: { query: “Core concept: “Location of special cases in specific documents.” Be sure to always reference any relevant memorandums of agreement and MOAs in your response.” alpha: 0.7 } where: { operator: And, operands: [ { path: [“site”], operator: Equal, valueText:“https://labor.booksai.org/”},{ operator: Or, operands: [ { path: [“groups”], operator: Equal, valueText: “SAG-AFTRA” } ] },{ operator: Or, operands: [ { path: [“taxonomy”], operator: Equal, valueText: “Current” } ] } ] } ){ _additional { distance score } docId site title nid type public url content taxonomy groups date summary questions sourceUrl solrId } } }

When I tried it again, using different concept, it worked. Can you see what the problem might be?

hi @SomebodySysop !!

In your query, it seems to have an unescaped " there :grimacing:

Can you try escaping that inside " with \ "

Let me know if that helps.

Thanks!

1 Like

Yep. That was it.

query solrai_getContext { Get { SolrCopy01 ( limit: 30 hybrid: { query: “Core concept: "Location of special cases in specific documents." Be sure to always reference any relevant memorandums of agreement and MOAs in your response.” alpha: 0.7 } where: { operator: And, operands: [ { path: [“site”], operator: Equal, valueText:“https://labor.booksai.org/”},{ operator: Or, operands: [ { path: [“groups”], operator: Equal, valueText: “SAG-AFTRA” } ] },{ operator: Or, operands: [ { path: [“taxonomy”], operator: Equal, valueText: “Current” } ] } ] } ){ _additional { distance score } docId site title nid type public url content taxonomy groups date summary questions sourceUrl solrId } } }

1 Like