Does every property with skip_vectorization=False get a vector?

Do all properties with skip vectorizatin=False get their own vector (which is a lot of data). Or, does the text data in those properties get concatenated such there is only one vector that reflects data from all of the skip_vectorization=False properties?

Good morning @moruga123,

When using a vectorizer for the entire collection (without named vectors configuration), the vectorizer processes all non-skipped properties together into a single vector, representing the object in the vector space. This means they are processed as part of a single object vector.

Meaning of an object in a vector space:

However, if you choose to use named vectors, which allows for explicit configuration of different vectorizers for each property, there will be distinct vector spaces for each property. Each property gets its own vectors as a “named vector.” This configuration allows for more control and separate vector representations for different properties within an object.

Named Vectors

How to configure Named Vectors

I hope this helps! Have a great weekend ahead! :star2:

Best regards,
Mohamed Shahin,
Weaviate Support Engineer

Suppose my collection has 4 TEXT properties called A, B, C, D. Suppose skip_vectorization=False for A and B. I would like to understand how just one vector is created for those. Are the text values from A and B concatenated together with “\n” (for example) and then that string is passed to the vectorizer?

hi @moruga123 !!

The end result: Weaviate will concatenate your properties values and collection name (optionally the property name too) into one big string and generate a vector out of it.

Before Named Vectors, you only had one vector per object. And this is how you can define a collection, considering your example:

client.collections.delete("MyCollection")
collection = client.collections.create(
    "MyCollection",
    properties=[
        wvc.config.Property(name="propertya", data_type=wvc.config.DataType.TEXT, skip_vectorization=True),
        wvc.config.Property(name="propertyb", data_type=wvc.config.DataType.TEXT, skip_vectorization=True),
        wvc.config.Property(name="propertyc", data_type=wvc.config.DataType.TEXT, vectorize_property_name=True, skip_vectorization=False),
        wvc.config.Property(name="propertyd", data_type=wvc.config.DataType.TEXT, vectorize_property_name=False, skip_vectorization=False),
    ],
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        vectorize_collection_name=True,
        base_url="https://webhook.site/93943f1e-3a96-47cb-af60-4b981b1be009",

    )
)
collection.data.insert({"propertya": "value A", "propertyb": "value B", "propertyc": "value C", "propertyd": "value D"})

This will be the exact output Weaviate will send to OpenAi in order to vectorize that object:

{
  "input": [
    "My Collection propertyc value C value D"
  ],
  "model": "text-embedding-3-small",
  "dimensions": 1536
}

Note in this payload:

  • We have the collection name.
  • We have property name propertyc, property value value C
  • We don’t have propertya nor propertyb because we have defined skip_vectorization
  • We only have the value of Value D, because we set to do not vectorize this property name

Now, with the adventure of named vectors, you can have multiple vectors per objects, generated using different properties, from different models.

This example from above, with an additional named vector, can be declared as

client.collections.delete("MyCollection")
collection = client.collections.create(
    "MyCollection",
    properties=[
        wvc.config.Property(name="propertya", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="propertyb", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="propertyc", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="propertyd", data_type=wvc.config.DataType.TEXT),
    ],
    vectorizer_config=[
        wvc.config.Configure.NamedVectors.text2vec_openai(
            base_url="https://webhook.site/93943f1e-3a96-47cb-af60-4b981b1be009",
            name="default",
            source_properties=["propertya", "propertyb", "propertyc"],
            vectorize_collection_name=True,
            model="text-embedding-3-large"
        ),
        wvc.config.Configure.NamedVectors.text2vec_openai(
            name="property_a_vector",
            source_properties=["propertya",],
            vectorize_collection_name=True,
            model="text-embedding-3-small",
        )
    ]
)
collection.data.insert({"propertya": "value A", "propertyb": "value B", "propertyc": "value C", "propertyd": "value D"})

If you want to inspect the generated output closely, one tip I can give is to define the base url of the vectorizer to a free service eg. http://webhook.site, like so:

...
        wvc.config.Configure.NamedVectors.text2vec_openai(
            base_url="https://webhook.site/93943f1e-3a96-47cb-af60-4b981b1be222",
            ...
        ),
...

Let me know if this helps :slight_smile:

1 Like