Plain GQL query with "containsAny" operator not working

Description

I have a KnowledgeBase-collection with the following keywords text-array property:

  {
          name: 'keywords',
          dataType: 'text[]',
          indexSearchable: true,
          tokenization: 'word',
 }

and I am trying to query this collection with the following gql-query:

query Get($tenant: String!, $where: GetObjectsKnowledgeBaseWhereInpObj) {
    Get {
        KnowledgeBase(limit: 10, tenant: $tenant, where: $where) {
            content
            keywords
        }
    }
}

and these variables:

{
    "tenant": "xxx",
    "where": {
        "operator": "ContainsAny",
        "path": [
            "keywords"
        ],
        "valueText": [
            "keyword1",
            "keyword2"
        ]
    }
}

The response shows the following error:

"explorer: list class: search: object search at index knowledgebase_xxx: local shard object search knowledgebase_xxx: value type should be []string but is string",

here the collection config:

{
    "class": "KnowledgeBase",
    "invertedIndexConfig": {
        "bm25": {
            "b": 0.75,
            "k1": 1.2
        },
        "cleanupIntervalSeconds": 60,
        "stopwords": {
            "additions": null,
            "preset": "en",
            "removals": null
        }
    },
    "multiTenancyConfig": {
        "autoTenantActivation": false,
        "autoTenantCreation": true,
        "enabled": true
    },
    "properties": [
        {
            "dataType": [
                "text"
            ],
            "indexFilterable": true,
            "indexRangeFilters": false,
            "indexSearchable": true,
            "moduleConfig": {
                "text2vec-aws": {
                    "skip": false,
                    "vectorizePropertyName": false
                }
            },
            "name": "content",
            "tokenization": "word"
        },
        {
            "dataType": [
                "text[]"
            ],
            "indexFilterable": true,
            "indexRangeFilters": false,
            "indexSearchable": true,
            "moduleConfig": {
                "text2vec-aws": {
                    "skip": false,
                    "vectorizePropertyName": false
                }
            },
            "name": "keywords",
            "tokenization": "word"
        }
    ],
    "replicationConfig": {
        "asyncEnabled": false,
        "deletionStrategy": "NoAutomatedResolution",
        "factor": 1
    },
    "shardingConfig": {
        "actualCount": 0,
        "actualVirtualCount": 0,
        "desiredCount": 0,
        "desiredVirtualCount": 0,
        "function": "",
        "key": "",
        "strategy": "",
        "virtualPerPhysical": 0
    },
    "vectorConfig": {
        "content_vector": {
            "vectorIndexConfig": {
                "distance": "cosine",
                "flat": {
                    "bq": {
                        "cache": false,
                        "enabled": false,
                        "rescoreLimit": -1
                    },
                    "distance": "cosine",
                    "pq": {
                        "cache": false,
                        "enabled": false,
                        "rescoreLimit": -1
                    },
                    "sq": {
                        "cache": false,
                        "enabled": false,
                        "rescoreLimit": -1
                    },
                    "vectorCacheMaxObjects": 1000000000000
                },
                "hnsw": {
                    "bq": {
                        "enabled": false
                    },
                    "cleanupIntervalSeconds": 300,
                    "distance": "cosine",
                    "dynamicEfFactor": 8,
                    "dynamicEfMax": 500,
                    "dynamicEfMin": 100,
                    "ef": -1,
                    "efConstruction": 128,
                    "filterStrategy": "sweeping",
                    "flatSearchCutoff": 40000,
                    "maxConnections": 32,
                    "multivector": {
                        "aggregation": "maxSim",
                        "enabled": false
                    },
                    "pq": {
                        "bitCompression": false,
                        "centroids": 256,
                        "enabled": false,
                        "encoder": {
                            "distribution": "log-normal",
                            "type": "kmeans"
                        },
                        "segments": 0,
                        "trainingLimit": 100000
                    },
                    "skip": false,
                    "sq": {
                        "enabled": false,
                        "rescoreLimit": 20,
                        "trainingLimit": 100000
                    },
                    "vectorCacheMaxObjects": 1000000000000
                },
                "threshold": 10000
            },
            "vectorIndexType": "dynamic",
            "vectorizer": {
                "text2vec-aws": {
                    "model": "cohere.embed-multilingual-v3",
                    "region": "eu-central-1",
                    "service": "bedrock",
                    "sourceProperties": [
                        "content"
                    ],
                    "vectorizeClassName": false
                }
            }
        }
    }
}

I also tried the same request via Postman with the same results.

Server Setup Information

  • Weaviate Server Version: 1.29.0
  • Deployment Method: cloud
  • Client Language and Version: bare gql weaviate cloud query console
  • Multitenancy: true

Any additional Information

I also tried the same request via Postman with the same results.

Addition:

I successfully use this filter for delete operations with the valueTextArray-Filter property.

Hey @Daniel_Engelhardt,

I see you mentioned that your deployment is cloud.

Could you please open a ticket with us in our ticketing system for our cloud users by email support@weaviate.io ?

Generally message indicates that there’s a data type mismatch for the valueText field in your GraphQL query. The ContainsAny operator expects an array of strings ([…]string), but the query is interpreting it as a single string.

Have you tried switching from valueText to valueTextArray?

Try modifying your query like this:

{
    "tenant": "xxx",
    "where": {
        "operator": "ContainsAny",
        "path": [
            "keywords"
        ],
        "valueTextArray": [
            "keyword1",
            "keyword2"
        ]
    }
}

Hi @Mohamed_Shahin,

yeah I already tried that and every other thinkable combination of array, string and valueText and valueTextArray-properties.

The valueTextArray-option is not allowed for the where-input (also mentioned in the documentation: Conditional filters | Weaviate) and results in this error:

"message": "Variable \"$where\" got invalid value {\"operator\":\"ContainsAny\",\"path\":[\"keywords\"],\"valueTextArray\":[\"xxx\"]}.\nIn field \"valueTextArray\": Unknown field.",

The cloud console also complains about it:

Hi Daniel!

I have reproduced this.

Here in python first:

from weaviate import classes as wvc
client.collections.delete("Test")
col = client.collections.create(
    "Test",
    multi_tenancy_config=wvc.config.Configure.multi_tenancy(
        enabled=True, auto_tenant_creation=True, auto_tenant_activation=True
    ),
    properties=[
        wvc.config.Property(name="content", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="keywords", data_type=wvc.config.DataType.TEXT_ARRAY),
    ]
)

col.with_tenant("t1").data.insert({"content": "Hello 1", "keywords": ["tag1", "tag2"]})
col.with_tenant("t1").data.insert({"content": "Hello 2", "keywords": ["tag2", "tag3"]})

col.with_tenant("t1").query.fetch_objects(
    filters=wvc.query.Filter.by_property("keywords").contains_any(["tag2", "tag10"])
)

Now, while experimenting connecting direct to the cluster using insomnia:

This will work:

This will not:

I was able to nail it down to TextGetObjectsTest:

I will poke internally to see if this makes sense.

Thanks!