Weaviate requires a string of UUID format when adding json Schema

Description

Hi I am trying to add the following swagger schema for an API endpoint using python client v4:

{'tags': ['Account'],
 'summary': 'Update Account',
 'security': [{'directLogin': [], 'gatewayLogin': []}],
 'description': '\n\nUpdate the account.\n\nAuthentication is Mandatory\n\n**URL Parameters:**\n\n[ACCOUNT\\_ID](/glossary#Account.account_id): 8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0\n\n[BANK\\_ID](/glossary#Bank.bank_id): gh.29.uk\n\n**JSON response body fields:**\n\n[**account\\_id**](/glossary#): 8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0\n\n[**account\\_routings**](/glossary#account_routings):\n\n[**address**](/glossary#address):\n\n[**bank\\_id**](/glossary#): gh.29.uk\n\n[**branch\\_id**](/glossary#): DERBY6\n\n[**label**](/glossary#): My Account\n\n[**scheme**](/glossary#scheme): OBP\n\n[**type**](/glossary#type):\n\n',
 'operationId': 'OBPv3.1.0-updateAccount',
 'parameters': [{'in': 'body',
   'name': 'body',
   'description': 'JObject object that needs to be added.',
   'required': True,
   'schema': {'type': 'object',
    'properties': {'label': {'type': 'string', 'example': 'Label'},
     'type': {'type': 'string', 'example': 'CURRENT'},
     'branch_id': {'type': 'string', 'example': '1234'},
     'account_routings': {'type': 'array',
      'items': {'type': 'object',
       'properties': {'scheme': {'type': 'string', 'example': 'OBP'},
        'address': {'type': 'string',
         'example': '8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0'}},
       'required': ['scheme', 'address']}}},
    'required': ['label', 'type', 'branch_id', 'account_routings']}},
  {'in': 'path',
   'name': 'ACCOUNT_ID',
   'description': 'The account id',
   'required': True,
   'type': 'string'},
  {'in': 'path',
   'name': 'BANK_ID',
   'description': 'The bank id',
   'required': True,
   'type': 'string'}],
 'responses': {'ok_200': {'description': 'Success',
   'schema': {'type': 'object',
    'properties': {'bank_id': {'type': 'string', 'example': 'gh.29.uk'},
     'account_id': {'type': 'string',
      'example': '8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0'},
     'label': {'type': 'string', 'example': 'Label'},
     'type': {'type': 'string', 'example': 'CURRENT'},
     'branch_id': {'type': 'string', 'example': '1234'},
     'account_routings': {'type': 'array',
      'items': {'type': 'object',
       'properties': {'scheme': {'type': 'string', 'example': 'IBAN'},
        'address': {'type': 'string',
         'example': 'DE91 1000 0000 0123 4567 89'}},
       'required': ['scheme', 'address']}}},
    'required': ['bank_id',
     'account_id',
     'label',
     'type',
     'branch_id',
     'account_routings']}},
  'badRequest_400': {'description': 'Error',
   'schema': {'properties': {'message': {'type': 'string',
      'example': 'OBP-10001: Incorrect json format.'}}}}}}

using

test_collection.data.insert(
                    properties = endpoint_object,
                    uuid=endpoint_uuid
                )

and get the following error from the client:

UnexpectedStatusCodeError: Object was not added! Unexpected status code: 422, with response body: {'error': [{'message': "invalid object: invalid object property 'responses' on class 'Test': property 'responses.ok_200': invalid object property 'responses.ok_200' on class 'Test': property 'responses.ok_200.schema': invalid object property 'responses.ok_200.schema' on class 'Test': property 'responses.ok_200.schema.properties': invalid object property 'responses.ok_200.schema.properties' on class 'Test': property 'responses.ok_200.schema.properties.account_routings': invalid object property 'responses.ok_200.schema.properties.account_routings' on class 'Test': property 'responses.ok_200.schema.properties.account_routings.items': invalid object property 'responses.ok_200.schema.properties.account_routings.items' on class 'Test': property 'responses.ok_200.schema.properties.account_routings.items.properties': invalid object property 'responses.ok_200.schema.properties.account_routings.items.properties' on class 'Test': property 'responses.ok_200.schema.properties.account_routings.items.properties.address': invalid object property 'responses.ok_200.schema.properties.account_routings.items.properties.address' on class 'Test': property 'responses.ok_200.schema.properties.account_routings.items.properties.address.example': invalid uuid property 'responses.ok_200.schema.properties.account_routings.items.properties.address.example' on class 'Test': requires a string of UUID format, but the given value is 'DE91 1000 0000 0123 4567 89'"}]}.

The key bit of that being invalid uuid property 'responses.ok_200.schema.properties.account_routings.items.properties.address.example' on class 'Test': requires a string of UUID format, but the given value is 'DE91 1000 0000 0123 4567 89'"}]}.

For some reason it is not liking what I’ve given as my example for address i.e.

 'properties': {'scheme': {'type': 'string', 'example': 'IBAN'},
        'address': {'type': 'string',
         'example': 'DE91 1000 0000 0123 4567 89'}},
       'required': ['scheme', 'address']}}},

I can’t change this to UUID as it encodes a an IBAN bank account address (not a real one don’t get any ideas.

Any clue why this is happening and how to ovveride this UUID checking? I thought maybe address might be a reserved name or something but I can’t find that anywhere.

Server Setup Information

  • Weaviate Server Version: 1.28.2
  • Deployment Method: docker
  • Client Language and Version: python3

Any additional Information

I’ll include my docker-compose.yaml file here for completion:

---
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.28.2
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_API_BASED_MODULES: 'true'
      ENABLE_MODULES: 'text2vec-ollama,generative-ollama'
      CLUSTER_HOSTNAME: 'node1'
volumes:
  weaviate_data:
...

hi @nemozak1 !!

Welcome to our community :hugs:

indeed, you can only pass a UUID as a parameter.

You can always generate a UUID from that value:

from weaviate.utils import generate_uuid
generated_uuid = generate_uuid(endpoint_uuid)

test_collection.data.insert(
                    properties = endpoint_object,
                    uuid=generated_uuid
                )

Let me know if this works for you.

Thanks!

I’m already doing that actually:

props = properties.copy()
            
            # Summarize the description of the endpoint in markdown
            #summary_chain_response = endpoint_summary_chain.invoke({"raw_description": properties['description']})
            #props['description'] = summary_chain_response.content
            
            #endpoint_object = {"path": path, "method": method, "tags": props["tags"], "schema": str(props)}
            endpoint_object = props

            # Change the description from HTML to markdown format
            props['description'] = md(props['description'])
            # Generate deterministic UUID from OperationID
            endpoint_uuid = generate_uuid5(props['operationId'])

            #Weaviate does not like straight response codes as object keys i.e. 200 or 404 so we need to change that
            responses_with_new_keys = {}
            for response_code, response_schema in endpoint_object['responses'].copy().items():
                
                stringified_response_code = response_code_to_string(int(response_code)) 
                
                responses_with_new_keys[stringified_response_code] = response_schema

            # Replace all keys
            endpoint_object['responses'] = responses_with_new_keys
            
            
            documents.append(endpoint_object)

            if not test_collection.data.exists(endpoint_uuid):
                test_collection.data.insert(
                    properties = endpoint_object,
                    uuid=endpoint_uuid
                )

the problem is with a specific property having an example. Weaviate thinks this should be in UUID format but it shouldn’t as it’s an IBAN

Hi!

Can you give a MRE (minimum reproducible example)?

Here is what I got:

from weaviate.util import generate_uuid5

endpoint_uuid = generate_uuid5("some unique text")

schema = {'tags': ['Account'],
 'summary': 'Update Account',
 'security': [{'directLogin': [], 'gatewayLogin': []}],
 'description': '\n\nUpdate the account.\n\nAuthentication is Mandatory\n\n**URL Parameters:**\n\n[ACCOUNT\\_ID](/glossary#Account.account_id): 8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0\n\n[BANK\\_ID](/glossary#Bank.bank_id): gh.29.uk\n\n**JSON response body fields:**\n\n[**account\\_id**](/glossary#): 8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0\n\n[**account\\_routings**](/glossary#account_routings):\n\n[**address**](/glossary#address):\n\n[**bank\\_id**](/glossary#): gh.29.uk\n\n[**branch\\_id**](/glossary#): DERBY6\n\n[**label**](/glossary#): My Account\n\n[**scheme**](/glossary#scheme): OBP\n\n[**type**](/glossary#type):\n\n',
 'operationId': 'OBPv3.1.0-updateAccount',
 'parameters': [{'in': 'body',
   'name': 'body',
   'description': 'JObject object that needs to be added.',
   'required': True,
   'schema': {'type': 'object',
    'properties': {'label': {'type': 'string', 'example': 'Label'},
     'type': {'type': 'string', 'example': 'CURRENT'},
     'branch_id': {'type': 'string', 'example': '1234'},
     'account_routings': {'type': 'array',
      'items': {'type': 'object',
       'properties': {'scheme': {'type': 'string', 'example': 'OBP'},
        'address': {'type': 'string',
         'example': '8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0'}},
       'required': ['scheme', 'address']}}},
    'required': ['label', 'type', 'branch_id', 'account_routings']}},
  {'in': 'path',
   'name': 'ACCOUNT_ID',
   'description': 'The account id',
   'required': True,
   'type': 'string'},
  {'in': 'path',
   'name': 'BANK_ID',
   'description': 'The bank id',
   'required': True,
   'type': 'string'}],
 'responses': {'ok_200': {'description': 'Success',
   'schema': {'type': 'object',
    'properties': {'bank_id': {'type': 'string', 'example': 'gh.29.uk'},
     'account_id': {'type': 'string',
      'example': '8ca8a7e4-6d02-40e3-a129-0b2bf89de9f0'},
     'label': {'type': 'string', 'example': 'Label'},
     'type': {'type': 'string', 'example': 'CURRENT'},
     'branch_id': {'type': 'string', 'example': '1234'},
     'account_routings': {'type': 'array',
      'items': {'type': 'object',
       'properties': {'scheme': {'type': 'string', 'example': 'IBAN'},
        'address': {'type': 'string',
         'example': 'DE91 1000 0000 0123 4567 89'}},
       'required': ['scheme', 'address']}}},
    'required': ['bank_id',
     'account_id',
     'label',
     'type',
     'branch_id',
     'account_routings']}},
  'badRequest_400': {'description': 'Error',
   'schema': {'properties': {'message': {'type': 'string',
      'example': 'OBP-10001: Incorrect json format.'}}}}}}

endpoint_object = {"path": "some_path", "method": "POST", "tags": ["tag1", "tag2"], "schema": schema}

client.collections.delete("Test")
collection = client.collections.create(
                "Test",
                vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
    )
endpoint_uuid = generate_uuid5("some unique text")
collection.data.insert(
    properties=endpoint_object
)

Note that, when passing the entire json as schema property, it gets mapped to object:

for p in collection.config.get().properties:
    if p.name == "schema":
        print(p.data_type)
# DataType.OBJECT

And that comes with some limitation:

Currently, object and object[] datatype properties are not indexed and not vectorized.

Future plans include the ability to index nested properties, for example to allow for filtering on nested properties and vectorization options.

so for example, this is the payload that gets to be vectorized on this scenario:

{
  "input": [
    "Test POST some_path tag1 tag2"
  ],
  "model": "text-embedding-3-small",
  "dimensions": 1536
}

Let me know if this helps.

Thanks!

Can you please post your collection definition?

    config = collection.config.get()
    print(config.properties)

My config is extremely long with lots of nested properties. I guess that this is part of the problem with putting in a JSON schema like this

_CollectionConfig(name='Test', description=None, generative_config=_GenerativeConfig(generative=<GenerativeSearches.OPENAI: 'generative-openai'>, model={}), inverted_index_config=_InvertedIndexConfig(bm25=_BM25Config(b=0.75, k1=1.2), cleanup_interval_seconds=60, index_null_state=False, index_property_length=False, index_timestamps=False, stopwords=_StopwordsConfig(preset=<StopwordsPreset.EN: 'en'>, additions=None, removals=None)), multi_tenancy_config=_MultiTenancyConfig(enabled=False, auto_tenant_creation=False, auto_tenant_activation=False), properties=[_Property(name='description', description="This property was generated by Weaviate's auto-schema feature on Mon Jan  6 20:23:55 2025", data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=_PropertyVectorizerConfig(skip=False, vectorize_property_name=False), vectorizer='text2vec-openai'), _Property(name='operationId', description="This property was generated by Weaviate's auto-schema feature on Mon Jan  6 20:23:55 2025", data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=_PropertyVectorizerConfig(skip=False, vectorize_property_name=False), vectorizer='text2vec-openai'), _Property(name='parameters', description="This property was generated by Weaviate's auto-schema feature on Mon Jan  6 20:23:55 2025", data_type=<DataType.OBJECT_ARRAY: 'object[]'>, index_filterable=True, index_range_filters=False, index_searchable=False, nested_properties=[_NestedProperty(data_type=<DataType.TEXT: 'text'>, description="This nested property was generated by Weaviate's auto-schema feature on Mon Jan  6 20:23:55 2025", index_filterable=True, index_searchable=True, name='name', nested_properties=None, tokenization=<Tokenization.WORD: 'word'>), _NestedProperty(data_type=<DataType.TEXT: 'text'>, description="This nested property was generated by Weaviate's auto-schema feature on Mon Jan  6 20:23:55 2025", index_filterable=True, index_searchable=True, name='description', nested_properties=None, tokenization=<Tokenization.WORD: 'word'>), 

… etc. (with more nested properties)

I’d only need the problematic property, eg properties.account_routings.items.properties.address.example. The rest doesn’t matter