What is the process for changing vectorizer model

SomebodySysop · November 27, 2024, 6:10pm

Description

What is the process for changing the vectorizer model in my schema?

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

I’d like to change it to gpt-4o-mini.

Server Setup Information

Weaviate Server Version:
Deployment Method: WCS
Multi Node? Number of Running Nodes:
Client Language and Version:
Multitenancy?:

Any additional Information

I do everything using REST API.

DudaNogueira · November 28, 2024, 11:47am

hi @SomebodySysop !!

Changing the vectorization model of a collection would trigger a re-vectorization of the entire collection, which Weaviate does not support.

This is not possible as of now, as well as adding a named vector, as we have not implemented async vectorization.

The only way for now is to create a new collection with the new vectorizer configuration, and then copy the data from the old collection to the new one.

SomebodySysop · November 28, 2024, 10:01pm

OK. Do you know if gpt-4o or gpt-4o-mini are supported in generative-openai?

Update: Never mind. Yes, they are: Generative AI | Weaviate

SomebodySysop · November 28, 2024, 10:09pm

So, I know how to create a new collection in the cluster.

How do I copy the data from the old collection to the new one using REST API? That is not discussed in the documentation: Migrate data | Weaviate

DudaNogueira · November 29, 2024, 10:42am

Note that on 1.27+ you can now change the generative configuration without the need to recreate the collection.

The best approach for the migration is the one described here:

Because this will leverage GRPC for reading and inserting the data.

You can get all objects if using the “cursor api”. It will give you the after parameter that allows you to bring the next objects:

Let me know if this helps!

Thanks!

SomebodySysop · November 29, 2024, 12:16pm

I am confused by this, because your next message says:

But I still have to migrate the data.

So, I’m utterly confused.

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

I want to change the model to gpt-4o-mini

What I thought I understood was that I needed to create a new schema, which would create a new collection, and then migrate the data from the old collection to the new.

So, in my case, I would create a new schema SolrCopy02 which would then create the SolrCopy02 collection. Then I would migrate the SolrCopy01 collection data to the SolrCopy02 collection.

What you wrote above doesn’t sound like that process. Or, I’m just not understanding the terminology. But, as gpt-3.5-turbo is a legacy model, I probably need to update soon.

DudaNogueira · November 29, 2024, 7:30pm

OK, let me clarify.

When you configure the vectorizer, it means that your data is going to be vectorized with that model.

So for example, let’s say you select a vectorizer that embed vectors with 300 dimensions.

Now you want to change to a different model with, let’s say, 1536 dimensions.

You will need to vectorize all your content again, because the vectors came from different models. Even if they had the same dimensionality, they came from different models.

So in order to change the vectorizer, you need to both define the model accordingly and to vectorize all your content again.

So the vectorizer configuration of a collection is not mutable.

Since Weaviate 1.27 version, the generative configuration is now mutable. This means that if you configured your collection to use, for example, cohere as the generative, you can change it to open ai, for example.

gpt-3.5, gpt-4 and so on, is a generative configuration.

Let me know if this helps!

Thanks!

SomebodySysop · November 30, 2024, 3:46am

I really appreciate the explanation, but it’s a bird’s eye view of the process whereas I need the steps. I want to change the schema from:

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

to

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-4o-mini”,
],

From the explanation above, it sounds like all I need to do is update the schema and I’m done. But, you also state that since I’m changing the vectorizer, I need to re-vectorize the content – which makes sense. And if that’s the case, I’m really talking about migrating content from SolrCopy01 to SolrCopy02, as opposed to just modifying the schema of SolrCopy01.

So, I still do not know how to proceed.

andrewisplinghoff · December 2, 2024, 10:47am

From what you wrote, you do not actually want to change the vectorizer, in both your schemas you state it as text2vec-openai. So the embedding model will stay the same and with that, you can keep your collection with the precalculated embeddings of that type.

The generative part is different from the vectorizer, it is used in RAG to come up with a response suitable returning to a user query at runtime (while embeddings are calculated during data ingestion). So it makes sense that Weaviate does not have a problem with the user changing that setting lateron. Hope that clarifies things - if not, maybe check out the documentation here to read more on the topic: Generative AI | Weaviate

SomebodySysop · December 4, 2024, 8:18am

This seems to indicate that the opposite is true: Slack

However, if I can update the schema without updating the collection, that sounds great! So, using just curl, how do I get from

SomebodySysop:

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

to

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-4o-mini”,
],

Can I do from the query screen in WCS? If not, how can I do it using REST API (curl)?

SomebodySysop · December 4, 2024, 8:45am

It would also be nice if I could edit the collection in WCS, as this screen seems to indicate, but it doesn’t allow me to change anything:

DudaNogueira · December 6, 2024, 6:59pm

Hi!

You can change the generative model as stated here:

It doesn’t have a curl example. For that, you need to use this endpoint:

First, get the collection json, change it, and PUT the payload, just like it is need for activating the ACORN

Let me know if this help!

Thanks!

SomebodySysop · December 8, 2024, 11:39pm

OK, I think I’m getting it through my thick skull. Since I’m using curl, I just have three steps:

Get the existing schema:

curl --request GET
-H “Content-Type: application/json”
–url http://localhost:8080/v1/schema/Test

Make whatever change I want.
Put it back:

curl
–request PUT
-H “Content-Type: application/json”
–url http://localhost:8080/v1/schema/Test
–data '{
“class”:“Test”, etc…

Right?

Final, last question:

If all I am changing in the schema is the model of the generative-openai:

SomebodySysop:

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-3.5-turbo”,
],

to

$schema = [
“class” => “SolrCopy01”,
“description” => “Class representing the SolrAI index”,
“vectorizer” => “text2vec-openai”,
“moduleConfig” => [
“generative-openai” => [
“model” => “gpt-4o-mini”,
],

Do I need to rebuild / migrate the collection?

Thanks for hanging in there with me. Once I get it, I’ve got it. But it sometimes takes me a minute.

Dirk · December 9, 2024, 6:05am

No This can be changed without rebuilding since 1.27.X (don’t remember the exact version)

DudaNogueira · December 9, 2024, 12:52pm

Hi @SomebodySysop !

No need to reindex / rebuild / migrate the collection for changing the generative configuration.

here we have a list of collection mutability:

All configurations in that list can be changed using the client or calling the REST endpoint as you mentioned.

Everything else will require a new collection with that setting already changed, copy the data over, and with that, a reindex will be triggered.

Hope this helps!

Thanks!

SomebodySysop · January 22, 2025, 1:01am

I followed the steps to change the generative-openai model, and it worked.

HOWEVER, I am now getting this error on all hybrid searches:
object vector search at index solrcopy01: shard solrcopy01_rsFKdoPa7JaL: vector search: 3072 vs 1536: vector lengths don’t match

Need Help!

Message Query (solrai_getContext) : { Get { SolrCopy01 ( limit: 10 hybrid: { query: “Rules for reuse of photography in television agreements between episodes. Be sure to always reference any relevant memorandums of agreement and MOAs in your response.” alpha: 0.8 } where: { operator: And, operands: [ { path: [“site”], operator: Equal, valueText:“https://labor.booksai.org/”},{ operator: Or, operands: [ { path: [“groups”], operator: Equal, valueText: “SAG-AFTRA” } ] },{ operator: Or, operands: [ { path: [“taxonomy”], operator: Equal, valueText: “Current” }, { path: [“taxonomy”], operator: Equal, valueText: “Archived” } ] } ] } ){ _additional { distance score } docId site title nid type public url content taxonomy groups date summary questions sourceUrl solrId } } }

Message Response (solrai_getContext) : Array ( [data] => Array ( [Get] => Array ( [SolrCopy01] => ) ) [errors] => Array ( [0] => Array ( [locations] => Array ( [0] => Array ( [column] => 4 [line] => 4 ) ) [message] => explorer: get class: vector search: object vector search at index solrcopy01: shard solrcopy01_rsFKdoPa7JaL: vector search: 3072 vs 1536: vector lengths don’t match [path] => Array ( [0] => Get [1] => SolrCopy01 ) ) ) [_elapsed_time] => 0.73112106323242 )

Mohamed_Shahin · January 22, 2025, 12:13pm

Hi @SomebodySysop, If you are not on latest 1.28.3, you may encountered this bug Mismatch in Vector Dimensions During Hybrid Search vs Vector Search · Issue #6873 · weaviate/weaviate · GitHub

Could you please upgrade and retry? If this is not your case, please open a new topic with all information so we can investigate.

SomebodySysop · January 22, 2025, 10:53pm

These queries are on the scbbs01 cluster, which I upgraded last night to 1.28.3.

The first hybrid query I executed went through. But the second returned this error, which I’ve never seen before:

Question: where are the special cases discussed, in which document(s)?
concept: Core concept: “Location of special cases in specific documents.” Be sure to always reference any relevant memorandums of agreement and MOAs in your response.

Array
(
[errors] => Array
(
[0] => Array
(
[locations] => Array
(
[0] => Array
(
[column] => 37
[line] => 7
)

                    )

                [message] => Syntax Error GraphQL request (7:37) Expected :, found Name "of"

6: \u0009\u0009\u0009 hybrid: {
7: \u0009\u0009\u0009\u0009query: “Core concept: “Location of special cases in specific documents.” Be sure to always reference any relevant memorandums of agreement and MOAs in your response.”
^
8: \u0009\u0009\u0009\u0009alpha: 0.7

                [path] => 
            )

    )

[_elapsed_time] => 0.10042881965637

)

This is the exact query sent:

Query (solrai_getContext) : { Get { SolrCopy01 ( limit: 30 hybrid: { query: “Core concept: “Location of special cases in specific documents.” Be sure to always reference any relevant memorandums of agreement and MOAs in your response.” alpha: 0.7 } where: { operator: And, operands: [ { path: [“site”], operator: Equal, valueText:“https://labor.booksai.org/”},{ operator: Or, operands: [ { path: [“groups”], operator: Equal, valueText: “SAG-AFTRA” } ] },{ operator: Or, operands: [ { path: [“taxonomy”], operator: Equal, valueText: “Current” } ] } ] } ){ _additional { distance score } docId site title nid type public url content taxonomy groups date summary questions sourceUrl solrId } } }

When I tried it again, using different concept, it worked. Can you see what the problem might be?

DudaNogueira · January 23, 2025, 7:44pm

hi @SomebodySysop !!

In your query, it seems to have an unescaped " there

Can you try escaping that inside " with \ "

Let me know if that helps.

Thanks!

SomebodySysop · January 23, 2025, 8:57pm

Yep. That was it.

query solrai_getContext { Get { SolrCopy01 ( limit: 30 hybrid: { query: “Core concept: "Location of special cases in specific documents." Be sure to always reference any relevant memorandums of agreement and MOAs in your response.” alpha: 0.7 } where: { operator: And, operands: [ { path: [“site”], operator: Equal, valueText:“https://labor.booksai.org/”},{ operator: Or, operands: [ { path: [“groups”], operator: Equal, valueText: “SAG-AFTRA” } ] },{ operator: Or, operands: [ { path: [“taxonomy”], operator: Equal, valueText: “Current” } ] } ] } ){ _additional { distance score } docId site title nid type public url content taxonomy groups date summary questions sourceUrl solrId } } }

Topic		Replies	Views
I would like to update the model from gpt-4o to gpt-4o-mini, any tip? Support	1	117	October 21, 2024
Change OpenAI Generative Model for Existing Classes Support	3	500	December 10, 2024
Can vectorizer embedding model be changed while in service? Support technical	1	126	January 10, 2025
[Feedback] Update to the Python client – collections, search, CRUD operations General developer-experience , feedback	19	1566	April 24, 2025
Alternatives to custom vectorizer for Weaviate Cloud? General python	3	165	November 1, 2024

What is the process for changing vectorizer model

Description

Server Setup Information

Any additional Information

Related topics