Possible bug in Equal operator?

Steve · August 21, 2025, 9:52am

Description

Hi,

When making a graphql search request in our database, we filter based on the article and provider:

example query:

{
  query:
    Get {
      test_collection(
        limit: 5,
        nearVector: {
          vector: [vector]
        },
        where: {
          operator: And,
          operands: [
            {
              path: ["article"],
              operator: Equal,
              valueString: "Article 3.6"
            },
            {
              path: ["provider"],
              operator: Equal,
              valueString: "provider 1"
            }
          ]
        }
      ) {
        title
        content
        provider
        article
        _additional {
          distance
          id
        }
      }
    }
  }

The problem is, it returns results of both article 3.6 as well as 6.3. I was wondering how the Equal operator works?

Does it match strings?

thank you for your time

Mohamed_Shahin · August 21, 2025, 9:56am

Hey @Steve,

That’s very odd! as yes it does match the keywords however that might be tokenization issue.

Could you please provide me more details of the following:

WeaviateDB version
Schema config in full

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

Steve · August 21, 2025, 12:22pm

Hi,

Thank you for the quick reply:

Our schema:

{
  'class': 'test_collection',
  'invertedIndexConfig': {
    'bm25': {
      'b': 0.75,
      'k1': 1.2
    },
    'cleanupIntervalSeconds': 60,
    'stopwords': {
      'additions': None,
      'preset': 'en',
      'removals': None
    }
  },
  'moduleConfig': {
    'text2vec-openai': {
      'baseURL': 'baseurl',
      'deploymentId': 'text-embedding-ada-002',
      'model': 'ada',
      'modelVersion': '002',
      'resourceName': 'text-embedding-ada-002',
      'skip': True,
      'vectorizeClassName': True
    }
  },
  'multiTenancyConfig': {
    'autoTenantActivation': False,
    'autoTenantCreation': False,
    'enabled': False
  },
  'properties': [
    {
      'dataType': [
        'text'
      ],
      'description': 'title of the chunk',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'title',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'law to which the chunk belongs',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'law',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'article of the law of the chunk',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'article',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'section of the law to wich the chunk belongs',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'section',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'jci of the chunk',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'jci',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'uri of the chunk',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'uri',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'provider of the chunk',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'provider',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'content of the chunk',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'content',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text[]'
      ],
      'description': 'list of accessible ids',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'access',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'location of the found chunk (Kluwer commentaar)',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'location',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'uuid'
      ],
      'description': 'id of the parent document',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': False,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'parent_id'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'valid_from',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'valid_from',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'valid_until',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'valid_until',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': 'hash of the content',
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'baseURL': 'baseurl',
          'deploymentId': 'text-embedding-ada-002',
          'model': 'ada',
          'modelVersion': '002',
          'resourceName': 'text-embedding-ada-002',
          'skip': True,
          'vectorizePropertyName': False
        }
      },
      'name': 'hash',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': "This property was generated by Weaviate's auto-schema feature on Mon Mar  3 13:38:22 2025",
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'skip': False,
          'vectorizePropertyName': False
        }
      },
      'name': 'document_type',
      'tokenization': 'word'
    },
    {
      'dataType': [
        'text'
      ],
      'description': "This property was generated by Weaviate's auto-schema feature on Mon Mar  3 13:38:22 2025",
      'indexFilterable': True,
      'indexRangeFilters': False,
      'indexSearchable': True,
      'moduleConfig': {
        'text2vec-openai': {
          'skip': False,
          'vectorizePropertyName': False
        }
      },
      'name': 'type',
      'tokenization': 'word'
    }
  ],
  'replicationConfig': {
    'asyncEnabled': False,
    'factor': 1
  },
  'shardingConfig': {
    'actualCount': 1,
    'actualVirtualCount': 128,
    'desiredCount': 1,
    'desiredVirtualCount': 128,
    'function': 'murmur3',
    'key': '_id',
    'strategy': 'hash',
    'virtualPerPhysical': 128
  },
  'vectorIndexConfig': {
    'bq': {
      'enabled': False
    },
    'cleanupIntervalSeconds': 300,
    'distance': 'l2-squared',
    'dynamicEfFactor': 8,
    'dynamicEfMax': 500,
    'dynamicEfMin': 100,
    'ef': -1,
    'efConstruction': 128,
    'filterStrategy': 'sweeping',
    'flatSearchCutoff': 40000,
    'maxConnections': 64,
    'pq': {
      'bitCompression': False,
      'centroids': 256,
      'enabled': False,
      'encoder': {
        'distribution': 'log-normal',
        'type': 'kmeans'
      },
      'segments': 0,
      'trainingLimit': 100000
    },
    'skip': False,
    'sq': {
      'enabled': False,
      'rescoreLimit': 20,
      'trainingLimit': 100000
    },
    'vectorCacheMaxObjects': 1000000000000
  },
  'vectorIndexType': 'hnsw',
  'vectorizer': 'text2vec-openai'
}

WeaviateDBVersion: 1.28

DudaNogueira · August 21, 2025, 12:31pm

Hi @Steve !!

As your article has word as it tokenized, the string Article 3.6 will become 3 tokens: article 3 and 6

If you want to use article to filter out results, you will need to set the tokenization to field, where you will end up with a token with the value Article 3.6

on that scenario, your equal comparison will work as you are expecting.

Check here for more information on tokenization: Overview of tokenization | Weaviate Documentation

Let us know if this helps

Steve · August 21, 2025, 12:35pm

Ah alright, is there any way to change the tokenization of a live collection? we already have some data and want to prevent re-vectorizing the database

DudaNogueira · August 21, 2025, 1:26pm

You can add a new property, with field tokenizer, and set it’s value.

By the way, a nice trick is having a field you want to search and filter duplicated. One for search and the other to filtering with word and field tokenization, respectively.

Also, you don’t need to re vectorize you data. You can migrate your dataset from one collection to a new one at the same cluster, with the new configuration, and carry over the vectors.

Check below. Note it will specify the vectors at the target collection.

Let me know if this helps!

Happy coding!

Steve · August 21, 2025, 2:19pm

Thank you for all the help!

Topic		Replies	Views
Filtering equals does not perform an equality comparison Support bug , python	1	484	June 11, 2024
Is the equal filter implicitely matching substrings? Support	3	698	February 16, 2024
Not_equal filter seems not work Support	2	752	January 23, 2024
Using a NotEqual where clause with a near_vector search Support	2	696	June 16, 2023
Filter by property equal not working as expected on string Support	2	56	September 19, 2025

Possible bug in Equal operator?

Description

Related topics