Bug in object replace

I believe there is a bug in object replace.

I have a collection that contains both vectorized and non-vectorized fields (set with skip_vectorization=True). When I call collection.data.replace, the object will only update if I change at least one of the vectorized fields. If I perform a replace changing only non-vectorized fields, no update happens.

Server Setup Information

  • Weaviate Server Version: 1.25.6 (local) & 1.25.7 (WCS)
  • Deployment Method: local docker & WCS
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python 4.6.5
  • Multitenancy?: Yes

hi @asido !!

Welcome to our community :hugs:

I was not able to reproduce this.

Here the code a crafted:

import weaviate
from weaviate.util import generate_uuid5
from weaviate import classes as wvc

client = weaviate.connect_to_local()

client.collections.delete("Test")
collection = client.collections.create(
    "Test",
    properties=[
        wvc.config.Property(name="vectorized", data_type=wvc.config.DataType.TEXT, skip_vectorization=False),
        wvc.config.Property(name="non_vectorized", data_type=wvc.config.DataType.TEXT, skip_vectorization=True)
    ]
)

# now we insert an object
collection.data.insert({"vectorized": "this should be vectorized", "non_vectorized": "this should not be vectorized"}, uuid=generate_uuid5("example1"))

# now we replace the object
collection.data.replace(properties={"non_vectorized": "changing here"}, uuid=generate_uuid5("example1"))

# now we get the object
collection.query.fetch_objects().objects[0].properties

#outputs
#{'vectorized': None, 'non_vectorized': 'changing here'}

Please, let me know if this code is close to what you have crafted, or let me know how to reproduce this issue.

Thanks!

Hi Duga,

Thanks for the prompt reply. I am out of office hours now so will have to try to create a reproducible sample tomorrow, however the main difference I can see between our implementations is that I do the replace on the entire object, even if some of the fields didn’t change. In your case that would be:

collection.data.replace(properties={"non_vectorized": "changing here", "vectorized": "this should be vectorized"}, uuid=generate_uuid5("example1"))

I can’t confirm right now that this will reproduce it but maybe you could give it a try? Otherwise I will try tomorrow.

Many thanks

Ok this turned out to be more awkward to narrow down than I expected and seems to require some rather odd specifics that maybe point to some other underlying issue?

Here is a reproducible example:

import weaviate
from weaviate.util import generate_uuid5
from weaviate.classes import config as wvc

client = weaviate.connect_to_local(
    headers={"X-OpenAI-Api-Key": "<key>"}
)

# Create the collection and explicitly set a vectorizer
client.collections.delete("Test")
collection = client.collections.create(
    "Test",
    vectorizer_config=(
        wvc.Configure.Vectorizer.text2vec_openai(
            model="ada",
            model_version="002",
        )
    ),
    properties=[
        wvc.Property(name="non_vectorized", data_type=wvc.DataType.TEXT, skip_vectorization=True),
        wvc.Property(name="vectorized_text", data_type=wvc.DataType.TEXT),
        wvc.Property(name="vectorized_array", data_type=wvc.DataType.TEXT_ARRAY)
    ]
)


uuid = generate_uuid5("example1")


# Insert the new object
data = {"non_vectorized": "Original Text", "vectorized_text": "Original Text", "vectorized_array": []}
collection.data.insert(properties={**data}, uuid=uuid)

# Replacing a non-vectorized property on its own does not work
replace_data = {**data, "non_vectorized": "I Changed"}
collection.data.replace(properties=replace_data, uuid=uuid)
print(collection.query.fetch_objects().objects[0].properties)

# Replacing either vectorized property at the same time works
replace_data = {**data, "vectorized_text": "I Changed", "non_vectorized": "I Changed"}
collection.data.replace(properties=replace_data, uuid=uuid)
print(collection.query.fetch_objects().objects[0].properties)

replace_data = {**data, "vectorized_array": ["I Changed"], "non_vectorized": "I Changed"}
collection.data.replace(properties=replace_data, uuid=uuid)
print(collection.query.fetch_objects().objects[0].properties)

Here is what makes this strange:

  1. If I don’t explicitly set a vectorizer_config, this issue does not occur.
  2. If I don’t have the text array in my schema, this issue does not occur.