Using a NotEqual where clause with a near_vector search

I have a class called Work with a collection attribute. I’m trying to get Weaviate to give me a list of works close in vector space to a given work, but from different collections.

First, I load the work and get its vector:

work=(
    CLIENT.query
        .get(class_name="Work", properties=["identifier", "title", "collection"])
        .with_limit(1)
        .with_additional(["vector"])
        .do()
)["data"]["Get"]["Work"][0]

vector=work["_additional"]["vector"]

So far, so good.

Then I execute the following query:

CLIENT.query
    .get(class_name="Work", properties=["collection", "identifier", "title"])
    .with_limit(100)
    .with_where({
        "path": ["collection"],
        "operator": "NotEqual",
        "valueText": work["collection"]
    })
    .with_near_vector({"vector": vector})
    .do()

This does return the 100 closest items to the given work, but all of the results are from the same collection. If I remove the with_near_vector clause, I do get 100 results from different collections. If I keep with_near_vector but change the with_where filter to { "path": ["collection"], "operator": "Equal", valueText: "Other Collection" }, I get 100 near vector responses specifically from Other Collection. So I know there are results that satisfy the query I want, but for some reason the NotEqual filter and with_near_vector don’t seem to be playing nice together, at least not the way I’m using them.

Am I missing something?

The notebooks and dataset are in this github repo. The specific queries above are taken from this notebook.

1 Like

NotEqual filter and with_near_vector don’t seem to be playing nice together

The way this combination should work is that where limits the search space, then nearVector performs the vector search.

Are the results from the same collection as that of the work? Or from a different collection, but all from the same different collection? You may want to tweak the tokenization setting for the collection property from word (default) to field.

2 Likes

Yes, I meant all the results are from the same collection as the work – the collection that was supposed to be filtered out. The tokenization setting does look like the problem. We’ll try field. Thanks!