Stopwords in filters cause errors, even when added to stopwords_removals

We have a weaviate collection that contains embedings in multiple languages, with metadata ‘langauge’ that we use to filter the values.

The languages include: “en”, “it” “de” etc…

When creating the collection, no special settings for inverted_index_config were used, so the default en setting is in place.

The problem is with ‘stopwords’, when trying to create a language filter for italian lanauge:
filters = wq.Filter.by_property("language").equal("it")
I will always get an error in query.near_text:

Exception has occurred: WeaviateQueryError
Query call with protocol GRPC search failed with message explorer: get class: concurrentTargetVectorSearch): explorer: get class: vector search: object vector search at index knowledgebase_customer_dev: shard knowledgebase_customer_dev_acj1FiCmxIOF: build inverted filter allow list: invalid search term, only stopwords provided. Stopwords can be configured in class.invertedIndexConfig.stopwords.
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = “explorer: get class: concurrentTargetVectorSearch): explorer: get class: vector search: object vector search at index knowledgebase_customer_dev: shard knowledgebase_customer_dev_acj1FiCmxIOF: build inverted filter allow list: invalid search term, only stopwords provided. Stopwords can be configured in class.invertedIndexConfig.stopwords”
debug_error_string = “UNKNOWN:Error received from peer {created_time:“2025-06-19T14:24:02.414391+02:00”, grpc_status:2, grpc_message:“explorer: get class: concurrentTargetVectorSearch): explorer: get class: vector search: object vector search at index knowledgebase_customer_dev: shard knowledgebase_customer_dev_acj1FiCmxIOF: build inverted filter allow list: invalid search term, only stopwords provided. Stopwords can be configured in class.invertedIndexConfig.stopwords”}”

I checked some forums and added “it” to the list of stopwords removals:

def update_stopwords_in_collection(
    collection: weaviate.collections.Collection,
    stopwords_additions: Optional[List[str]] = None,
    stopwords_removals: Optional[List[str]] = None,
) -> None:
    """
    Update stopwords in a Weaviate collection.

    Args:
        collection: The Weaviate collection object to update
        stopwords_additions: List of stopwords to add (remove from indexing)
        stopwords_removals: List of stopwords to remove (allow indexing)
    """
    print(f"Updating stopwords for collection: {collection.name}")

    if not stopwords_removals:
        stopwords_removals = []

    if not stopwords_additions:
        stopwords_additions = []

    try:
        config = collection.config.get()
        print(f"Collection config pre-update: {config.inverted_index_config.stopwords}")

        collection.config.update(
            inverted_index_config=Reconfigure.inverted_index(
                stopwords_additions=stopwords_additions,
                stopwords_removals=stopwords_removals,
            )
        )

        config = collection.config.get()
        print(
            f"Collection config post-update: {config.inverted_index_config.stopwords}"
        )

    except Exception as e:
        print(f"Error updating stopwords: {e}")
        raise

And even though the update appears to be succesfull (the collection post update has the correct stopwords in the config), the language “it” filter still causes the same error. This is the print post update from the function:

Collection config post-update: _StopwordsConfig(preset=<StopwordsPreset.EN: ‘en’>, additions=, removals=[‘it’])

Please help me how to navigate this? Why are stopwords even included when doing metadata filtering?

Server Setup Information

Weaviate Database version 1.27.27 (hosted on cloud → on weaviate)

Code we use for querying:

def query_weaviate_chunks(
    collection: weaviate.collections.Collection,
    query: str,
    filters: wq.Filter,
    limit: int = 5,
) -> List[Dict[str, Any]]:
    """
    Query the Weaviate collection using a vector and return matching chunks.

    Parameters:
    - collection: The Weaviate collection object.
    - query: A string to query against
    - filters: A valid Weaviate Filter object, that can prefilter data before embedding search
    - limit: The maximum number of results to return (default is 5).

    Returns:
    - List of matching objects with their properties and metadata distance.
    """
    try:
        response = collection.query.near_text(
            query=query,
            limit=limit,
            return_metadata=wq.MetadataQuery(distance=True),
            filters=filters,
            target_vector=["content_vector"],
        )

        results = []
        for o in response.objects:
            results.append(
                {
                    "properties": o.properties,
                    "distance": o.metadata.distance,
                    "uuid": o.uuid,
                }
            )
    except UnexpectedStatusCodeException as err:
        print(err)
        return []

    return results


def get_data_from_knowledge_base_w_language_filter(
    collection: weaviate.collections.Collection,
    user_query: str,
    n: int = 5,
    language: Optional[str] = None,
) -> List[str]:
    """
    Returns n most related chunks for the user query

    Args:
        collection: The Weaviate collection object
        user_query: The search query
        n: Number of results to return
        language: Language filter for the search

    Returns:
        List of content chunks
    """
    filters = wq.Filter.by_property("language").equal(language)

    weaviate_result = query_weaviate_chunks(collection, user_query, filters, limit=n)
    chunks = [result["properties"]["content"] for result in weaviate_result]
    print(f"Found these text chunks: {chunks}")

    return chunks

Hello @Sebastjan_Skrbinsek1,

First of all, welcome to Weaviate! It’s lovely to have you here :partying_face:, and I’m looking forward to helping you. :hugs:

The behavior you’re seeing is due to Weaviate’s default stopword handling. You can find the relevant list of stopwords in the source code here:

Common English words such as “the”, “a”, “an”, “and”, etc., are considered stopwords. When they appear alone in a search query, no results are returned. But when used in combination with other meaningful terms (e.g., “the document”), the search works as expected.

It is possible to make Weaviate not treat words like “it” as stopwords during filtering. The recommended way to do this is by using “field” tokenization instead of the default “word” tokenization for the property you’re filtering on. However you may need WORD tokenization for your search.

You’re using our Cloud service, I’d recommend opening a ticket with us at support@weaviate.io. This is our official support ticketing system for cloud customers. While community channels like Forums, Slack or GitHub are great, opening a ticket helps us investigate faster since we can access your cluster configuration directly.

I’ll also try to replicate the config and run a test on my end.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

1 Like