Samuel
September 13, 2024, 1:44pm
1
I have the following collection that is receiving an error: invalid combination of properties
client.collections.create(
"Blogs",
description="this is testing db to understand weaviate",
vectorizer_config=Configure.Vectorizer.text2vec_transformers(
inference_url="http://t2v-transformers:8080", vectorize_collection_name=False
),
properties=[
Property(name="idCSV", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="gender", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="age", data_type=DataType.INT, skip_vectorization=True),
Property(name="topic", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="sign", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="date", data_type=DataType.DATE, skip_vectorization=True),
Property(name="text", data_type=DataType.TEXT_ARRAY, skip_vectorization=False),
],
)
Note: This is the v4 python api.
I have seen another issue that was similar . However, the above collection is vectorizing on the âtextâ attribute.
How do I vectorize on a single attribute? How important is the vectorization of the collection name?
Thank you very much in advance.
Samuel
September 13, 2024, 2:05pm
2
Found the answer . configuring text2vec_transformer is a little different.
Here is a working updated version.
client.collections.create(
"Blogs",
description="this is testing db to understand weaviate",
vectorizer_config=[
Configure.NamedVectors.text2vec_transformers(
name="text_vector",
source_properties=["text"],
vectorize_collection_name=False,
)
],
properties=[
Property(name="idCSV", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="gender", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="age", data_type=DataType.INT, skip_vectorization=True),
Property(name="topic", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="sign", data_type=DataType.TEXT, skip_vectorization=True),
Property(name="date", data_type=DataType.DATE, skip_vectorization=True),
Property(name="text", data_type=DataType.TEXT_ARRAY, skip_vectorization=False),
],
)
hi @Samuel ! Welcome to our community!
What is the server version you are running?
Both codes gave me the same expected error.
The problem here is that it must have at least one text element other than the array.
Because you have defined to not vectorize the collection name, you must provide one as a property.
If you change the module to openai, it will raise a more detailed error:
UnexpectedStatusCodeError: Collection may not have been created properly.! Unexpected status code: 422, with response body: {âerrorâ: [{âmessageâ: âmodule âtext2vec-openaiâ: invalid properties: didnât find a single property which is of type string or text and is not excluded from indexing. In addition the class name is excluded from vectorization as well, meaning that it cannot be used to determine the vector position. To fix this, set âvectorizeClassNameâ to true if the class name is contextionary-valid. Alternatively add at least contextionary-valid text/string property which is not excluded from indexingâ}]}.
Let me know if this helps!
THanks!
Samuel
September 16, 2024, 9:41am
4
I see. Removing the following line worked.
Question: Why must the DB require one element other the array to work? Iâm quite new at vectorDBs.
Hi!
While checking the code , it looks like this is a guardrail to make sure there is something vectorizable to pass to the embeddings service.
However, it only seems to check for a text data type, but a text_array should pass that validation as well.