Weaviate mutates the case of document meta properties

Description

If I create a collection like so:

client.connect()
try:
    collection = client.collections.create(
        name='lowercase',
        properties=[
            wvcc.Property(name="CONTENT", data_type=wvcc.DataType.TEXT, skip_vectorization=False, index_searchable=True),
            wvcc.Property(name="URL", data_type=wvcc.DataType.TEXT, skip_vectorization=True, index_searchable=True),
        ],
        vector_index_config=wvc.config.Configure.VectorIndex.hnsw(), # https://weaviate.io/developers/weaviate/manage-data/collections
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_transformers(
            passage_inference_url="http://t2v-transformers-passage:8080",
            query_inference_url='http://t2v-transformers-query:8080'
        )
    )

then weaviate will mutate the property names to ‘cONTENT’ and ‘uRL’ when I later retrieve documents from the collection. Why is this?

The collection name itself also gets mutated to ‘Lowercase’. However, the following command will still work:

collection = client.collections.get('lowercase')

So, are collections treated as case insensitive?

Server Setup Information

  • Weaviate Server Version: 1.24.2
  • Deployment Method: docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: python, 4.5.3

If I use property names where weaviate is the intermediary, e.g. I do collection.query.hybrid(…, query_properties=[‘CONTENT’]), then it is able to see the CONTENT property without a problem even though it prints as ‘cONTENT’.

However, it would be better if the properties were left alone, so that when I export document meta data, I can access them using whatever case was there originally prior to indexing documents in weaviate.

Hi @moruga123 !

Those are really interesting findings. While we suggest, as a convention, PascalCase for the Collection name and lowercase for the property names

Enforcing those can bring unnecessary friction for the DX.

I have consolidated your code here for better reproducibility (and removed unnecessary parts):

import weaviate
from weaviate import classes as wvc
client = weaviate.connect_to_local()
client.collections.delete("lowercase")
lowercase = client.collections.create(
    name='lowercase',
    properties=[
        wvc.config.Property(name="CONTENT", data_type=wvc.config.DataType.TEXT, skip_vectorization=False, index_searchable=True),
        wvc.config.Property(name="URL", data_type=wvc.config.DataType.TEXT, skip_vectorization=True, index_searchable=True),
        wvc.config.Property(name="URL_2", data_type=wvc.config.DataType.TEXT, skip_vectorization=True, index_searchable=True),
        # this doesn't work:
        #wvc.config.Property(name="URL:2", data_type=wvc.config.DataType.TEXT, skip_vectorization=True, index_searchable=True),
    ],
    vectorizer_config=None
)
print(lowercase.config.get().properties)
# insert some document
# MAKE SURE YOU HAVE AUTOSCHEMA_ENABLED: 'false' in your docker compose
lowercase.data.insert({
    "CONTENT": "content", "URL": "http://weaviate.io"
})
# lets fetch our objects
query = lowercase.query.fetch_objects()
# NOTHING HERE
print("NOTHING HERE: ", query.objects[0].properties.get("CONTENT"))
# CONTENT IS IN FACT HERE
print("CONTENT IS IN FACT HERE: ", query.objects[0].properties.get("cONTENT"))

Also, I have found some discussions about this here:

On top of that, I exported the created collection and imported using the v3 client, so this all happens on the server side:

lowercase_schema = lowercase.config.get().to_dict()
lowercase_schema["class"] = "v3class"
clientv3 = weaviate.Client("http://localhost:8080")
clientv3.schema.create_class(lowercase_schema)
v3class.config.get().properties

and got the same results. I also agree here:

However, it would be better if the properties were left alone…

While the collection name is easier to avoid this friction, the property is not, as you pointed out.

I believe the best course of action here is to raise an issue in GH: